Amritpal Singh (0851779)¶

Table of Contents

  1. Installing Required Packages
  2. General Understanding
  3. Importings
  4. Quick Information of data
  5. EDA
  6. Cleaning Data
  7. Encoding
  8. Distribution Overview
    • Distribution Visualization Function
  9. Imputing Target Variable
  10. Base Model Check Function
  11. Imputation Function
  12. Transforming Data
    • Imputing & Dropping
  13. Feature Selection
  14. Dealing with Outliers
  15. Splitting Data
  16. Models
    • Linear Regression
    • Decision Tree
    • Random Forest
  17. Metrics Comparison
    • Best Model
    • Best Metrics
    • Explanation
  18. Model Comparison
  19. Marketing Strategies
    • Existing Users
    • Potential Guests
    • New Hosts
    • Summary
  20. Visualization
    • Actual vs Predicted (Prices))
    • Reviews Importance in Price
    • Price by Accomodates
    • Seasonal Price Trend
    • Monthly Price Trend
    • Weekly Price Trent
  21. References

General understanding of data/files¶

  • We have 4 files in total
    • calendar.csv
      • Rows Count:- 7966127
      • Columns Count:- 7 (listing_id, date, available, price, adjusted_price, minimum_nights, maximum_nights)
    • listings.csv
      • Rows Count:- 21825
      • Columns Count:- 75 (id, listing_url, scrape_id, last_scraped, source, name, description, neighborhood_overview, picture_url, host_id, host_url, host_name, host_since, host_location, host_about, host_response_time, host_response_rate, host_acceptance_rate, host_is_superhost, host_thumbnail_url, host_picture_url, host_neighbourhood, host_listings_count, host_total_listings_count, host_verifications, host_has_profile_pic, host_identity_verified, neighbourhood, neighbourhood_cleansed, neighbourhood_group_cleansed, latitude, longitude, property_type, room_type, accommodates, bathrooms, bathrooms_text, bedrooms, beds, amenities, price, minimum_nights, maximum_nights, minimum_minimum_nights, maximum_minimum_nights, minimum_maximum_nights, maximum_maximum_nights, minimum_nights_avg_ntm, maximum_nights_avg_ntm, calendar_updated, has_availability, availability_30, availability_60, availability_90, availability_365, calendar_last_scraped, number_of_reviews, number_of_reviews_ltm, number_of_reviews_l30d, first_review, last_review, review_scores_rating, review_scores_accuracy, review_scores_cleanliness, review_scores_checkin, review_scores_communication, review_scores_location, review_scores_value, license, instant_bookable, calculated_host_listings_count, calculated_host_listings_count_entire_homes, calculated_host_listings_count_private_rooms, calculated_host_listings_count_shared_rooms, reviews_per_month)
    • listings2.csv
      • Rows Count:- 21825
      • Columns Count:- 18 (id, name, host_id, host_name, neighbourhood_group, neighbourhood, latitude, longitude, room_type, price, minimum_nights, number_of_reviews, last_review, reviews_per_month, calculated_host_listings_count, availability_365, number_of_reviews_ltm, license)
    • reviews.csv
      • Rows Count:- 573077
      • Columns Count:- 6 (listing_id, id, date, reviewer_id, reviewer_name, comments)

Installing Packages¶

In [1]:
!pip install -r requirements.txt
Requirement already satisfied: joblib==1.4.2 in c:\users\acer\anaconda3\envs\deeplearning\lib\site-packages (from -r requirements.txt (line 1)) (1.4.2)
Requirement already satisfied: numpy==1.26.4 in c:\users\acer\anaconda3\envs\deeplearning\lib\site-packages (from -r requirements.txt (line 2)) (1.26.4)
Requirement already satisfied: pandas==2.2.2 in c:\users\acer\anaconda3\envs\deeplearning\lib\site-packages (from -r requirements.txt (line 3)) (2.2.2)
Requirement already satisfied: plotly==5.24.1 in c:\users\acer\anaconda3\envs\deeplearning\lib\site-packages (from -r requirements.txt (line 4)) (5.24.1)
Requirement already satisfied: xgboost==2.1.2 in c:\users\acer\anaconda3\envs\deeplearning\lib\site-packages (from -r requirements.txt (line 5)) (2.1.2)
Requirement already satisfied: seaborn==0.13.2 in c:\users\acer\anaconda3\envs\deeplearning\lib\site-packages (from -r requirements.txt (line 6)) (0.13.2)
Requirement already satisfied: matplotlib==3.9.2 in c:\users\acer\anaconda3\envs\deeplearning\lib\site-packages (from -r requirements.txt (line 7)) (3.9.2)
Requirement already satisfied: scikit-learn==1.5.2 in c:\users\acer\anaconda3\envs\deeplearning\lib\site-packages (from -r requirements.txt (line 8)) (1.5.2)
Requirement already satisfied: python-dateutil>=2.8.2 in c:\users\acer\anaconda3\envs\deeplearning\lib\site-packages (from pandas==2.2.2->-r requirements.txt (line 3)) (2.9.0)
Requirement already satisfied: pytz>=2020.1 in c:\users\acer\anaconda3\envs\deeplearning\lib\site-packages (from pandas==2.2.2->-r requirements.txt (line 3)) (2024.2)
Requirement already satisfied: tzdata>=2022.7 in c:\users\acer\anaconda3\envs\deeplearning\lib\site-packages (from pandas==2.2.2->-r requirements.txt (line 3)) (2024.1)
Requirement already satisfied: tenacity>=6.2.0 in c:\users\acer\anaconda3\envs\deeplearning\lib\site-packages (from plotly==5.24.1->-r requirements.txt (line 4)) (9.0.0)
Requirement already satisfied: packaging in c:\users\acer\appdata\roaming\python\python312\site-packages (from plotly==5.24.1->-r requirements.txt (line 4)) (24.1)
Requirement already satisfied: scipy in c:\users\acer\anaconda3\envs\deeplearning\lib\site-packages (from xgboost==2.1.2->-r requirements.txt (line 5)) (1.14.1)
Requirement already satisfied: contourpy>=1.0.1 in c:\users\acer\anaconda3\envs\deeplearning\lib\site-packages (from matplotlib==3.9.2->-r requirements.txt (line 7)) (1.3.0)
Requirement already satisfied: cycler>=0.10 in c:\users\acer\anaconda3\envs\deeplearning\lib\site-packages (from matplotlib==3.9.2->-r requirements.txt (line 7)) (0.12.1)
Requirement already satisfied: fonttools>=4.22.0 in c:\users\acer\anaconda3\envs\deeplearning\lib\site-packages (from matplotlib==3.9.2->-r requirements.txt (line 7)) (4.53.1)
Requirement already satisfied: kiwisolver>=1.3.1 in c:\users\acer\anaconda3\envs\deeplearning\lib\site-packages (from matplotlib==3.9.2->-r requirements.txt (line 7)) (1.4.7)
Requirement already satisfied: pillow>=8 in c:\users\acer\anaconda3\envs\deeplearning\lib\site-packages (from matplotlib==3.9.2->-r requirements.txt (line 7)) (10.4.0)
Requirement already satisfied: pyparsing>=2.3.1 in c:\users\acer\anaconda3\envs\deeplearning\lib\site-packages (from matplotlib==3.9.2->-r requirements.txt (line 7)) (3.1.4)
Requirement already satisfied: threadpoolctl>=3.1.0 in c:\users\acer\anaconda3\envs\deeplearning\lib\site-packages (from scikit-learn==1.5.2->-r requirements.txt (line 8)) (3.5.0)
Requirement already satisfied: six>=1.5 in c:\users\acer\anaconda3\envs\deeplearning\lib\site-packages (from python-dateutil>=2.8.2->pandas==2.2.2->-r requirements.txt (line 3)) (1.16.0)

Importings Libraries¶

In [2]:
# Importing Libraries

import os
import ast
import joblib
import warnings
import datetime
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib as mpl
import plotly.express as px
from matplotlib import ticker
import matplotlib.pyplot as plt
from xgboost import XGBRegressor
from sklearn.cluster import KMeans
from sklearn.decomposition import PCA
from sklearn.tree import DecisionTreeRegressor
from sklearn.linear_model import LinearRegression
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder, MinMaxScaler
from sklearn.metrics import silhouette_score, mean_squared_error, mean_absolute_error, r2_score

# Iterative Imputer
from sklearn.experimental import enable_iterative_imputer
from sklearn.impute import IterativeImputer

# Suppressing all the warnings
warnings.filterwarnings('ignore')
In [3]:
# Storing all the data files in a list

data_files = ['calendar', 'listings', 'listings2', 'reviews']

Data Quick Information¶

In [4]:
# Running a quick analysis over the shape and columns of all of our datasets

for file in data_files:
    print("File:-", file)
    df = pd.read_csv(f'Data-AirBNB//{file}.csv', dtype = str)
    rows, columns = df.shape
    print("Rows:-", rows)
    print("Columns:-", columns)
    print(', '.join(df.columns))
    print()
File:- calendar
Rows:- 7966127
Columns:- 7
listing_id, date, available, price, adjusted_price, minimum_nights, maximum_nights

File:- listings
Rows:- 21825
Columns:- 75
id, listing_url, scrape_id, last_scraped, source, name, description, neighborhood_overview, picture_url, host_id, host_url, host_name, host_since, host_location, host_about, host_response_time, host_response_rate, host_acceptance_rate, host_is_superhost, host_thumbnail_url, host_picture_url, host_neighbourhood, host_listings_count, host_total_listings_count, host_verifications, host_has_profile_pic, host_identity_verified, neighbourhood, neighbourhood_cleansed, neighbourhood_group_cleansed, latitude, longitude, property_type, room_type, accommodates, bathrooms, bathrooms_text, bedrooms, beds, amenities, price, minimum_nights, maximum_nights, minimum_minimum_nights, maximum_minimum_nights, minimum_maximum_nights, maximum_maximum_nights, minimum_nights_avg_ntm, maximum_nights_avg_ntm, calendar_updated, has_availability, availability_30, availability_60, availability_90, availability_365, calendar_last_scraped, number_of_reviews, number_of_reviews_ltm, number_of_reviews_l30d, first_review, last_review, review_scores_rating, review_scores_accuracy, review_scores_cleanliness, review_scores_checkin, review_scores_communication, review_scores_location, review_scores_value, license, instant_bookable, calculated_host_listings_count, calculated_host_listings_count_entire_homes, calculated_host_listings_count_private_rooms, calculated_host_listings_count_shared_rooms, reviews_per_month

File:- listings2
Rows:- 21825
Columns:- 18
id, name, host_id, host_name, neighbourhood_group, neighbourhood, latitude, longitude, room_type, price, minimum_nights, number_of_reviews, last_review, reviews_per_month, calculated_host_listings_count, availability_365, number_of_reviews_ltm, license

File:- reviews
Rows:- 571853
Columns:- 6
listing_id, id, date, reviewer_id, reviewer_name, comments

calendar file:¶

  1. listing_id: Unique identifier for each listing.
  2. date: Date for the listing's availability.
  3. available: Whether the listing is available on that date (true/false).
  4. price: Price of the listing per night.
  5. adjusted_price: Price adjusted based on discounts or surcharges.
  6. minimum_nights: Minimum number of nights a guest can book.
  7. maximum_nights: Maximum number of nights a guest can book.

listings file:¶

  1. id: Unique identifier for the listing.
  2. listing_url: URL to the listing on Airbnb.
  3. scrape_id: Identifier for the data scrape session.
  4. last_scraped: Date when the listing was last scraped.
  5. source: Source of the listing data (typically Airbnb).
  6. name: Name/title of the listing.
  7. description: Detailed description of the listing.
  8. neighborhood_overview: Overview of the neighborhood where the listing is located.
  9. picture_url: URL of the listing's main image.
  10. host_id: Unique identifier for the host.
  11. host_url: URL to the host's profile on Airbnb.
  12. host_name: Name of the host.
  13. host_since: Date when the host joined Airbnb.
  14. host_location: Location of the host.
  15. host_about: Personal description written by the host.
  16. host_response_time: How quickly the host typically responds.
  17. host_response_rate: Host's response rate as a percentage.
  18. host_acceptance_rate: Host's acceptance rate for booking requests.
  19. host_is_superhost: Indicates if the host is a "Superhost" (true/false).
  20. host_thumbnail_url: URL to the host's thumbnail image.
  21. host_picture_url: URL to the host's main image.
  22. host_neighbourhood: Neighbourhood where the host resides.
  23. host_listings_count: Number of listings the host manages.
  24. host_total_listings_count: Total number of listings associated with the host.
  25. host_verifications: Verification methods completed by the host.
  26. host_has_profile_pic: Indicates if the host has a profile picture (true/false).
  27. host_identity_verified: Indicates if the host's identity is verified (true/false).
  28. neighbourhood: Neighbourhood where the listing is located.
  29. neighbourhood_cleansed: Standardized neighborhood name.
  30. neighbourhood_group_cleansed: Larger grouping of neighborhoods (if available).
  31. latitude: Latitude coordinate of the listing.
  32. longitude: Longitude coordinate of the listing.
  33. property_type: Type of property (e.g., apartment, house).
  34. room_type: Type of room offered (e.g., entire place, private room).
  35. accommodates: Number of guests the listing can accommodate.
  36. bathrooms: Number of bathrooms.
  37. bathrooms_text: Descriptive text about the bathroom setup.
  38. bedrooms: Number of bedrooms.
  39. beds: Number of beds.
  40. amenities: List of amenities provided.
  41. price: Price of the listing per night.
  42. minimum_nights: Minimum number of nights required for booking.
  43. maximum_nights: Maximum number of nights allowed for booking.
  44. minimum_minimum_nights: Shortest minimum night requirement across booking windows.
  45. maximum_minimum_nights: Longest minimum night requirement across booking windows.
  46. minimum_maximum_nights: Shortest maximum night limit across booking windows.
  47. maximum_maximum_nights: Longest maximum night limit across booking windows.
  48. minimum_nights_avg_ntm: Average minimum nights required for future bookings.
  49. maximum_nights_avg_ntm: Average maximum nights allowed for future bookings.
  50. calendar_updated: How recently the calendar was updated.
  51. has_availability: Indicates if the listing has availability (true/false).
  52. availability_30: Number of available nights in the next 30 days.
  53. availability_60: Number of available nights in the next 60 days.
  54. availability_90: Number of available nights in the next 90 days.
  55. availability_365: Number of available nights in the next 365 days.
  56. calendar_last_scraped: Date when the calendar was last scraped.
  57. number_of_reviews: Total number of reviews for the listing.
  58. number_of_reviews_ltm: Number of reviews in the last 12 months.
  59. number_of_reviews_l30d: Number of reviews in the last 30 days.
  60. first_review: Date of the first review for the listing.
  61. last_review: Date of the most recent review.
  62. review_scores_rating: Overall rating score based on guest reviews.
  63. review_scores_accuracy: Accuracy rating based on guest reviews.
  64. review_scores_cleanliness: Cleanliness rating based on guest reviews.
  65. review_scores_checkin: Check-in process rating based on guest reviews.
  66. review_scores_communication: Communication rating based on guest reviews.
  67. review_scores_location: Location rating based on guest reviews.
  68. review_scores_value: Value-for-money rating based on guest reviews.
  69. license: License number for the listing (if applicable).
  70. instant_bookable: Indicates if the listing is available for instant booking (true/false).
  71. calculated_host_listings_count: Number of listings under the same host.
  72. calculated_host_listings_count_entire_homes: Number of entire home listings by the host.
  73. calculated_host_listings_count_private_rooms: Number of private room listings by the host.
  74. calculated_host_listings_count_shared_rooms: Number of shared room listings by the host.
  75. reviews_per_month: Average number of reviews the listing receives per month.

listings2 file:¶

  1. id: Unique identifier for the listing.
  2. name: Name/title of the listing.
  3. host_id: Unique identifier for the host.
  4. host_name: Name of the host.
  5. neighbourhood_group: Larger grouping of neighborhoods (if available).
  6. neighbourhood: Neighbourhood where the listing is located.
  7. latitude: Latitude coordinate of the listing.
  8. longitude: Longitude coordinate of the listing.
  9. room_type: Type of room offered (e.g., entire place, private room).
  10. price: Price of the listing per night.
  11. minimum_nights: Minimum number of nights required for booking.
  12. number_of_reviews: Total number of reviews for the listing.
  13. last_review: Date of the most recent review.
  14. reviews_per_month: Average number of reviews the listing receives per month.
  15. calculated_host_listings_count: Number of listings under the same host.
  16. availability_365: Number of available nights in the next 365 days.
  17. number_of_reviews_ltm: Number of reviews in the last 12 months.
  18. license: License number for the listing (if applicable).

reviews file:¶

  1. listing_id: Unique identifier for the listing being reviewed.
  2. id: Unique identifier for the review.
  3. date: Date when the review was posted.
  4. reviewer_id: Unique identifier for the reviewer.
  5. reviewer_name: Name of the reviewer.
  6. comments: Comments left by the reviewer.

EDA¶

We will be using the dataset listings for Segmentation. We will run the EDA and the model according to segmentation problem.

In [5]:
# Reading listings2 dataset as this will be used for segmentation

df = pd.read_csv("Data-AirBNB//listings.csv")
listings = df.copy()
listings.head(5)
Out[5]:
id listing_url scrape_id last_scraped source name description neighborhood_overview picture_url host_id ... review_scores_communication review_scores_location review_scores_value license instant_bookable calculated_host_listings_count calculated_host_listings_count_entire_homes calculated_host_listings_count_private_rooms calculated_host_listings_count_shared_rooms reviews_per_month
0 1419 https://www.airbnb.com/rooms/1419 2.024090e+13 9/6/2024 previous scrape Beautiful home in amazing area! This large, family home is located in one of T... The apartment is located in the Ossington stri... https://a0.muscache.com/pictures/76206750/d643... 1565 ... 5.00 5.00 5.00 NaN f 1 1 0 0 0.05
1 8077 https://www.airbnb.com/rooms/8077 2.024090e+13 9/6/2024 previous scrape Downtown Harbourfront Private Room Guest room in a luxury condo with access to al... NaN https://a0.muscache.com/pictures/11780344/141c... 22795 ... 4.90 4.92 4.83 NaN f 2 1 1 0 0.92
2 26654 https://www.airbnb.com/rooms/26654 2.024090e+13 9/6/2024 city scrape World Class @ CN Tower, convention centre, The... CN Tower, TIFF Bell Lightbox, Metro Convention... There's a reason they call it the Entertainmen... https://a0.muscache.com/pictures/81811785/5dcd... 113345 ... 4.76 4.86 4.67 NaN f 5 5 0 0 0.25
3 27423 https://www.airbnb.com/rooms/27423 2.024090e+13 9/6/2024 city scrape Executive Studio Unit- Ideal for One Person Brand new, fully furnished studio basement apa... NaN https://a0.muscache.com/pictures/176936/b687ed... 118124 ... 5.00 4.87 4.87 NaN f 1 1 0 0 0.17
4 30931 https://www.airbnb.com/rooms/30931 2.024090e+13 9/6/2024 previous scrape Downtown Toronto - Waterview Condo Split level waterfront condo with a breathtaki... NaN https://a0.muscache.com/pictures/227971/e8ebd7... 22795 ... NaN NaN NaN NaN f 2 1 1 0 0.01

5 rows × 75 columns

In [6]:
# Shape of our dataset

listings.shape
Out[6]:
(21825, 75)

Let's overview the features to see which are obvious to delete

In [7]:
listings.columns
Out[7]:
Index(['id', 'listing_url', 'scrape_id', 'last_scraped', 'source', 'name',
       'description', 'neighborhood_overview', 'picture_url', 'host_id',
       'host_url', 'host_name', 'host_since', 'host_location', 'host_about',
       'host_response_time', 'host_response_rate', 'host_acceptance_rate',
       'host_is_superhost', 'host_thumbnail_url', 'host_picture_url',
       'host_neighbourhood', 'host_listings_count',
       'host_total_listings_count', 'host_verifications',
       'host_has_profile_pic', 'host_identity_verified', 'neighbourhood',
       'neighbourhood_cleansed', 'neighbourhood_group_cleansed', 'latitude',
       'longitude', 'property_type', 'room_type', 'accommodates', 'bathrooms',
       'bathrooms_text', 'bedrooms', 'beds', 'amenities', 'price',
       'minimum_nights', 'maximum_nights', 'minimum_minimum_nights',
       'maximum_minimum_nights', 'minimum_maximum_nights',
       'maximum_maximum_nights', 'minimum_nights_avg_ntm',
       'maximum_nights_avg_ntm', 'calendar_updated', 'has_availability',
       'availability_30', 'availability_60', 'availability_90',
       'availability_365', 'calendar_last_scraped', 'number_of_reviews',
       'number_of_reviews_ltm', 'number_of_reviews_l30d', 'first_review',
       'last_review', 'review_scores_rating', 'review_scores_accuracy',
       'review_scores_cleanliness', 'review_scores_checkin',
       'review_scores_communication', 'review_scores_location',
       'review_scores_value', 'license', 'instant_bookable',
       'calculated_host_listings_count',
       'calculated_host_listings_count_entire_homes',
       'calculated_host_listings_count_private_rooms',
       'calculated_host_listings_count_shared_rooms', 'reviews_per_month'],
      dtype='object')

On checking the above features, some features which are obvious to delete are as follow:-

  • id
  • listing_url
  • scrape_id
  • last_scraped
  • source
  • name
  • description
  • neighborhood_overview
  • picture_url
  • host_id
  • host_url
  • host_name
  • host_location
  • host_about
  • host_thumbnail_url
  • host_picture_url
  • neighbourhood
  • neighbourhood_group_cleansed
  • calendar_updated
  • calendar_last_scraped

Above are the obvious features which are not needed for further analysis.

In [8]:
# Dropping unnecessary features from the dataset

listings.drop(
    columns = [
        'id', 'listing_url', 'scrape_id', 'last_scraped', 'source', 'name', 'description', 'neighborhood_overview', 
        'picture_url', 'host_id', 'host_url', 'host_name', 'host_location', 'host_about', 
        'host_thumbnail_url', 'host_picture_url', 'neighbourhood', 
        'neighbourhood_group_cleansed', 'calendar_updated', 'calendar_last_scraped'
    ], inplace = True
)

listings.shape
Out[8]:
(21825, 55)

As we deleted some obvious features but still left with 57 features. Now, we will deal with the fetaures which carries values not suitable for model as well as encoding so, we will do transformation before moving forward.

In [9]:
# Checking the unique values of all the features:

for col in listings.columns:
    print(f"{col} = {listings[col].unique()}")
    print()
host_since = ['8/8/2008' '6/22/2009' '4/25/2010' ... '8/31/2024' '9/2/2024' '9/3/2024']

host_response_time = [nan 'within a few hours' 'within an hour' 'within a day'
 'a few days or more']

host_response_rate = [nan '100%' '77%' '50%' '88%' '80%' '0%' '97%' '33%' '90%' '86%' '94%'
 '96%' '75%' '67%' '91%' '98%' '69%' '60%' '40%' '92%' '95%' '25%' '70%'
 '20%' '30%' '76%' '83%' '89%' '78%' '93%' '99%' '79%' '71%' '85%' '65%'
 '10%' '73%' '8%' '63%' '82%' '57%' '13%' '14%' '17%' '45%' '6%' '74%'
 '47%' '87%' '9%' '26%' '81%' '55%' '62%' '27%' '58%' '84%' '22%' '46%'
 '64%' '29%']

host_acceptance_rate = [nan '38%' '100%' '60%' '62%' '94%' '89%' '50%' '0%' '96%' '86%' '83%'
 '46%' '42%' '75%' '95%' '92%' '80%' '67%' '82%' '40%' '98%' '97%' '71%'
 '87%' '73%' '69%' '78%' '93%' '61%' '76%' '91%' '37%' '90%' '88%' '66%'
 '84%' '99%' '65%' '74%' '33%' '17%' '77%' '85%' '79%' '56%' '70%' '59%'
 '31%' '68%' '14%' '63%' '20%' '25%' '28%' '48%' '81%' '43%' '29%' '64%'
 '51%' '53%' '22%' '49%' '44%' '15%' '30%' '27%' '24%' '39%' '58%' '35%'
 '21%' '72%' '57%' '55%' '36%' '11%' '34%' '47%' '18%' '52%' '8%' '5%'
 '13%' '54%' '41%' '23%' '12%' '26%' '45%' '9%' '32%' '16%' '10%' '2%'
 '7%']

host_is_superhost = ['f' 't' nan]

host_neighbourhood = ['Commercial Drive' 'Harbourfront' 'Entertainment District'
 'Greenwood-Coxwell' 'Parkdale' 'The Beaches' 'Rosedale' 'Niagara'
 'High Park North' 'Scarborough City Centre' 'Downtown Toronto'
 'The Junction' 'Oakridge' 'Little Portugal' 'Studio District'
 'Garden District' 'Yorkville' 'The Annex' 'Fairbank' 'Deer Park'
 'The Pocket' 'Davisville' 'Willowdale' 'Fashion District'
 'Flemingdon Park' 'The Danforth' 'Amesbury' 'Oakwood' 'Dovercourt Park'
 'Trinity-Bellwoods' 'Roncesvalles' 'Palmerston/Little Italy' 'Mimico'
 'Riverdale' 'Woodbine Corridor' 'Cliffside' 'Broadview North'
 'Morningside' 'Cabbagetown' 'Saint Lawrence' 'Don Valley Village' nan
 'South Hill/Rathnelly' 'Wallace Emerson' 'Danforth Village' 'Corktown'
 'Westminster/Branson' 'Greek Town' 'Cedarvale Humewood' 'Dufferin Grove'
 'Islington' 'Parkwoods' 'Old East York' 'Agincourt'
 'Saint Andrew/Windfields' 'Yonge Eglinton' 'Bedford Park' 'Bendale'
 'Glen Park' 'Le Plateau' 'Mount Dennis' 'Newtonbrook'
 'Stonegate-Queensway' 'Clanton Park' 'Bayview' 'Henry Farm' 'Cliffcrest'
 'Kensington Market' 'New Toronto' 'Runnymede' 'Lytton Park' 'Forest Hill'
 'Guildwood' 'Alderwood' 'Lambton Baby Point' 'Woodbine/Lumsden'
 'Wychwood Park' "Tam O'Shanter" 'Parkview' "L'Amoreaux" 'Birch Cliff'
 'Casa Loma' 'Wexford/Maryvale' 'Lawrence Park' 'York University Heights'
 'Leaside' 'Pellam Park' 'The Kingsway' 'Eglinton East'
 'Financial District' 'Ionview' 'Swansea' 'Long Branch' 'Bayview Village'
 'Weston' 'Eringate' 'West Humber' 'Unionville' 'Jane and Finch'
 'Westmount' 'Morningside Heights' 'Rexdale' 'West Rouge' 'Richview'
 'Don Mills' 'Santa Monica' 'Markland Woods' 'Princess'
 'Rockcliffe Smythe' 'Dorset Park' 'Armour Heights' 'Thorncliffe Park'
 'Clairlea' 'Malvern' 'Pelmo Park' 'Etobicoke West Mall' 'Thistletown'
 'West Hill' 'Scarborough Junction' 'Mount Olive' 'The Westway'
 'Scarborough Village' 'Manse Valley' 'Keelesdale' 'Woburn'
 'Humber Valley' 'Highland Creek' 'The Elms' 'Nortown' 'Clinton Hill'
 'Pleasant View' 'Govalle' 'Sunnybrook' 'Victoria Village' 'Burnaby'
 'Thornhill' 'North Park' 'Hillcrest Village' 'Downsview' 'Samac'
 'Humbermede' ' Puntarenas residence' 'Port Union' 'West Oak Trails'
 'Recoleta' 'Upper East Side' 'Downtown Vancouver' 'Milliken' 'Calica'
 'Ocean Park' 'Humberlea' 'Humber Summit' 'South Cambie' 'Crescent Town'
 'Malvern West' 'Downtown Montreal' 'Merkaz HaIr' 'Queenston' 'South Core'
 'Beachborough' 'Lauderdale Isles' 'Beltline' 'Bellas Vistas' 'Sherkston'
 'Fenelon Falls' 'Erin Mills' 'Recreio dos Bandeirantes' 'Downtown Miami'
 'KDA Scheme 5' 'Cote-des-Neiges' 'Vanier' 'Lakeshore' 'Astrodome'
 'CONDOMINIOS CANTA MAR' 'Chappel East' 'Northside' 'University'
 'Ancaster' 'Cooksville' 'Kitsilano' 'Woodbridge' 'Marpole'
 'Berczy Village' 'Santo Agostinho' 'Lakeview'
 'Deutschstown Historic District' 'Bayview Glen' 'Antarayin'
 'Letitia Heights' 'Bolton' 'Port Dalhousie' 'Western Hill' 'Maple'
 'East Credit' 'Clearview' 'La Veleta' 'Somerset Brooke' 'Kamay'
 'Sage Hill' 'Valley Creek' 'Stipley' 'Waterdown' 'Grapeview' 'Venice'
 'Coboconk' 'KW Hospital' 'Landsdale' 'Keswick' 'Burnt River'
 'Shoreline West' 'Meadowvale' 'Knight' 'Central Vancouver'
 'Civic Hospital - Experimental Farm - Central Park'
 'Industrial Sector A and Keith' 'Khu phố 3' 'Willmott' 'East Windsor'
 'Phường 3' 'South Fort Lauderdale' 'Central Hamilton' 'Cadboro Bay'
 'Acton' 'Aldershot' 'Centretown' 'Midtown Toronto' 'Northglen'
 'Sainte-Rose' 'Crestview' 'Nautilus' 'Bebedero' 'Glendale' 'Donevan'
 'Dixie' 'Concord' 'Fairview' 'Hollywood Lakes' 'Barra do Cunhau'
 'Bonnington' 'Malton' 'Willow Beach' 'Symons Valley' 'ChampionsGate'
 'District des Riverains' 'High Park-Swansea' 'Notre-Dame-de-Grace'
 'White Oaks' 'Kedron' 'Allapattah' 'Downtown' 'Pinheiros'
 'Wismer Commons' 'Little Havana' 'Rosemary District' 'Port Sydney'
 'Lower Mount Royal' 'Castle Green' 'South Beach' 'Silvertown' 'West Bend'
 'Bedford-Stuyvesant' 'Uptown Core' 'Crescent Heights' 'Port Credit'
 'Old Malton Village' 'Playa Pelada' 'Cambuí' 'Streetsville' 'Sector B'
 'Willoughby' 'Glenridge' 'Windfields' 'Flatlands' 'Central City'
 'Victoria Island' 'West End' 'Notting Hill' 'Paradise Valley Village'
 'La Florida' 'Beverley Glen' 'Historic Old Town' 'LB of Islington'
 'Varna Center' 'Spring Valley' 'South Los Angeles' 'Far North Dallas'
 'Zona Hotelera' 'Mount Hope' 'Central Oshawa'
 'Afton Oaks / River Oaks Area' 'La Bainerie' 'Hamilton Road'
 'Normanhurst' 'Country Hills East' 'Whitmore Park' 'Saint-Henri' 'Medway'
 'Westover Hills' 'Carling' 'Beechwood' 'Evanston' 'Paia' 'Paquita'
 'Jackson' 'Heritage Valley' 'North Glenora' 'North End East'
 'Golf Club Manor' 'Reunion' 'East Vancouver' 'Highland Lakes'
 'Victorian District - East' 'Sandpointe' 'Downtown Dartmouth'
 'Fifth by Northwest' 'Sainte-Dorothée' 'Tempo' "Za'abeel 1"
 'Saint-Timothée' 'West Oakville' 'Burj Residence Phase I & II' 'Mineola'
 'Riverside' 'Kerrisdale' 'Durand' 'Westboro']

host_listings_count = [  1.   2.   5.   4.   3.   9.   8.  19.   6.   7. 103.  34.  11.  14.
  16.  10.  13.  22.  25.  17.  12.  29.  15.  nan  82.  38.  33.  26.
  18. 105.  21.  27. 284.  36. 484.  46.  30.  32.  47.  39.  20. 140.
  24.  44.  28.  35.  50.  51.  63.  45.  95.  75.  84.  52.  31.  83.
  23.  89.  96.  49. 182.  48.  37.  55. 147.  90. 152.  54.  41.]

host_total_listings_count = [  1.   3.  10.   5.   6.  19.  18.   2.   8.   4.   7.   9.  24.  14.
 180.  11.  41.  17.  29.  16.  13.  96.  27.  26.  45.  60.  32.  33.
  nan  23.  12.  54.  37.  20. 106.  46.  53.  55. 109.  84.  15.  67.
 268.  30.  22.  62.  21.  42.  34.  39.  28. 312.  44.  59.  40. 619.
  50. 334.  38.  79.  31.  48. 121. 150.  86.  61. 123.  63.  47. 216.
  35.  25. 555.  43.  49.  66.  75. 194.  36. 100.  88. 176. 172.  57.
  52. 108.  58. 219. 242. 190. 164. 101.  94.  51.  91. 171.]

host_verifications = ["['email', 'phone']" "['email', 'phone', 'work_email']" "['phone']"
 "['phone', 'work_email']" "['email']" "['work_email']" '[]'
 "['email', 'work_email']" nan]

host_has_profile_pic = ['t' 'f' nan]

host_identity_verified = ['t' 'f' nan]

neighbourhood_cleansed = ['Little Portugal' 'Waterfront Communities-The Island' 'South Riverdale'
 'South Parkdale' 'The Beaches' 'Rosedale-Moore Park'
 'Bay Street Corridor' 'Church-Yonge Corridor' 'Niagara' 'High Park North'
 'Woburn' 'Junction Area' 'Oakridge' 'Cabbagetown-South St.James Town'
 'Annex' 'Caledonia-Fairbank' 'Casa Loma' 'North St.James Town'
 'Blake-Jones' 'Moss Park' 'Mount Pleasant West' 'Willowdale East'
 'Palmerston-Little Italy' 'Flemingdon Park' 'East End-Danforth'
 'Brookhaven-Amesbury' 'Oakwood Village'
 'Dovercourt-Wallace Emerson-Junction' 'Trinity-Bellwoods' 'Roncesvalles'
 'Mimico (includes Humber Bay Shores)' 'Woodbine Corridor'
 'Birchcliffe-Cliffside' 'Broadview North' 'Morningside'
 'Kensington-Chinatown' 'High Park-Swansea' 'Don Valley Village'
 'Danforth' 'Newtonbrook West' 'Playter Estates-Danforth'
 'Greenwood-Coxwell' 'Regent Park' 'Dufferin Grove' 'North Riverdale'
 'Humewood-Cedarvale' 'Mount Pleasant East' 'Taylor-Massey' 'University'
 'Islington-City Centre West' 'Parkwoods-Donalda' 'Yonge-St.Clair'
 'Old East York' 'Corso Italia-Davenport' 'Agincourt South-Malvern West'
 'St.Andrew-Windfields' 'Yonge-Eglinton' 'Lawrence Park North' 'Bendale'
 'Englemount-Lawrence' 'Mount Dennis' 'Willowdale West'
 'Stonegate-Queensway' 'Rockcliffe-Smythe' 'Clanton Park'
 'Bayview Woods-Steeles' 'Bayview Village' 'Cliffcrest' 'New Toronto'
 'Agincourt North' 'Etobicoke West Mall' 'Bedford Park-Nortown'
 'Forest Hill South' 'Guildwood' 'Alderwood' "L'Amoreaux"
 'Lambton Baby Point' 'Woodbine-Lumsden' 'Danforth East York'
 'Bridle Path-Sunnybrook-York Mills' 'Wychwood'
 'Runnymede-Bloor West Village' "Tam O'Shanter-Sullivan"
 'Lansing-Westgate' 'Long Branch' 'Steeles' 'Wexford/Maryvale'
 'Lawrence Park South' 'York University Heights' 'Briar Hill-Belgravia'
 'Westminster-Branson' 'Leaside-Bennington' 'Hillcrest Village'
 'Weston-Pellam Park' 'Bathurst Manor' 'Kingsway South' 'Ionview'
 'Downsview-Roding-CFB' 'Weston' 'Pelmo Park-Humberlea'
 'Clairlea-Birchmount' 'Eglinton East' 'Yorkdale-Glen Park'
 'Eringate-Centennial-West Deane' 'West Humber-Clairville' 'Kennedy Park'
 'Newtonbrook East' 'Black Creek' 'Beechborough-Greenbrook'
 'Edenbridge-Humber Valley' 'Rouge' 'West Hill' 'Rexdale-Kipling'
 'Willowridge-Martingrove-Richview' "O'Connor-Parkview" 'Victoria Village'
 'Henry Farm' 'Banbury-Don Mills' 'Markland Wood' 'Princess-Rosethorn'
 'Dorset Park' 'Kingsview Village-The Westway' 'Keelesdale-Eglinton West'
 'Thorncliffe Park' 'Scarborough Village' 'Malvern' 'Pleasant View'
 'Thistletown-Beaumond Heights' 'Mount Olive-Silverstone-Jamestown'
 'Glenfield-Jane Heights' 'Highland Creek' 'Elms-Old Rexdale'
 'Forest Hill North' 'Maple Leaf' 'Humbermede' 'Humber Heights-Westmount'
 'Centennial Scarborough' 'Milliken' 'Humber Summit' 'Rustic']

latitude = [43.6459     43.6408     43.64608    ... 43.67552425 43.6584633
 43.64129161]

longitude = [-79.42423    -79.37673    -79.39032    ... -79.44212902 -79.3841276
 -79.39637268]

property_type = ['Entire home' 'Private room in rental unit' 'Entire condo'
 'Entire rental unit' 'Private room in condo' 'Private room in home'
 'Entire townhouse' 'Entire loft' 'Entire guest suite'
 'Private room in townhouse' 'Entire serviced apartment'
 'Shared room in rental unit' 'Private room in guest suite'
 'Entire guesthouse' 'Private room in cottage' 'Entire place'
 'Private room in bungalow' 'Private room in loft' 'Private room'
 'Private room in serviced apartment' 'Entire bungalow'
 'Shared room in home' 'Private room in guesthouse' 'Shared room in condo'
 'Private room in bed and breakfast' 'Shared room in townhouse'
 'Private room in barn' 'Entire villa' 'Tiny home' 'Floor'
 'Private room in villa' 'Shared room in hostel' 'Entire cottage'
 'Private room in castle' 'Shared room in loft' 'Entire home/apt'
 'Private room in hostel' 'Shared room in guesthouse' 'Camper/RV'
 'Room in boutique hotel' 'Shared room in bungalow' 'Earthen home'
 'Shared room in boat' 'Private room in tiny home' 'Room in hotel'
 'Private room in earthen home' 'Boat' 'Island'
 'Private room in casa particular' 'Entire vacation home'
 'Private room in vacation home' 'Room in aparthotel' 'Castle'
 'Shipping container' 'Shared room in bed and breakfast'
 'Shared room in hotel' 'Shared room in casa particular' 'Cave'
 'Private room in cycladic house' 'Shared room']

room_type = ['Entire home/apt' 'Private room' 'Shared room']

accommodates = [10  2  4  1  5  3  6  8  7  9 16 13 14 12 11 15]

bathrooms = [nan 1.  0.5 2.  1.5 2.5 4.  5.  3.  0.  4.5 3.5 5.5 6.5 6.  8. ]

bathrooms_text = ['3 baths' '1.5 baths' '1 bath' '1 private bath' '1 shared bath'
 'Half-bath' '2 baths' '1.5 shared baths' '0 baths' '2.5 baths' '4 baths'
 '5 baths' '2 shared baths' '3.5 baths' '0 shared baths' '3 shared baths'
 '4.5 baths' nan '5.5 baths' '6.5 baths' '4 shared baths'
 '2.5 shared baths' 'Shared half-bath' '3.5 shared baths'
 '4.5 shared baths' '6 baths' 'Private half-bath' '8 baths']

bedrooms = [ 5. nan  1.  0.  2.  3.  4.  9.  8.  6.  7. 50. 12. 10.]

beds = [nan  2.  1.  3.  4.  5.  6.  0.  7.  8.  9. 10. 12. 11.]

amenities = ['["TV", "First aid kit", "Wifi", "Kitchen", "Dryer", "Essentials", "Indoor fireplace", "Shampoo", "Smoke alarm", "Washer", "Heating", "Air conditioning", "Fire extinguisher"]'
 '["Wifi", "Pool", "TV with standard cable", "Shampoo", "Free parking on premises", "Elevator", "Smoke alarm", "Gym", "Heating", "Air conditioning"]'
 '["Wifi", "Paid parking on premises", "Essentials", "Elevator", "Extra pillows and blankets", "Long term stays allowed", "Iron", "Dedicated workspace", "Electric stove", "Single level home", "Bed linens", "Free washer \\u2013 In unit", "Building staff", "Smoke alarm", "Hot water", "Heating", "Oven", "Children\\u2019s dinnerware", "Hair dryer", "Pets allowed", "Luggage dropoff allowed", "Dishwasher", "Coffee maker", "Free dryer \\u2013 In unit", "Dishes and silverware", "Self check-in", "Microwave", "Patio or balcony", "Fire extinguisher", "Private entrance", "Kitchen", "Refrigerator", "Exercise equipment", "TV with standard cable", "Shared gym in building", "Shampoo", "City skyline view", "Shared pool - available all year", "Carbon monoxide alarm", "Central air conditioning", "Cooking basics", "Private hot tub", "Hangers"]'
 ...
 '["Wifi", "Paid parking on premises", "Dryer", "Elevator", "Long term stays allowed", "Toaster", "Iron", "TV", "Bed linens", "Hot water kettle", "Smoke alarm", "Freezer", "Hot water", "Heating", "Oven", "Air conditioning", "Baking sheet", "Hair dryer", "Pets allowed", "Housekeeping - included with your stay", "Dining table", "Dishwasher", "Coffee maker", "Dishes and silverware", "Pool table", "BBQ grill", "Bathtub", "Microwave", "Exterior security cameras on property", "Kitchen", "Wine glasses", "Movie theater", "Refrigerator", "Private patio or balcony", "Exercise equipment", "Shared gym in building", "Stove", "Blender", "Washer", "Extra pillows and blankets", "Cooking basics", "Clothing storage", "Hangers"]'
 '["Wifi", "Paid parking on premises", "Dryer", "Elevator", "Long term stays allowed", "Toaster", "Iron", "Dedicated workspace", "TV", "Bed linens", "Hot water kettle", "Shared patio or balcony", "Smoke alarm", "Freezer", "Hot water", "Heating", "Oven", "Air conditioning", "Baking sheet", "Hair dryer", "Pets allowed", "Housekeeping - included with your stay", "Dining table", "Dishwasher", "Coffee maker", "Dishes and silverware", "Pool table", "Bathtub", "Microwave", "Exterior security cameras on property", "Kitchen", "Wine glasses", "Movie theater", "Refrigerator", "Exercise equipment", "Shared gym in building", "Stove", "Blender", "Shared sauna", "Washer", "Extra pillows and blankets", "Cooking basics", "Clothing storage", "Hangers"]'
 '["Wifi", "Dryer", "Essentials", "Cleaning products", "Ethernet connection", "Elevator", "Host greets you", "Long term stays allowed", "Toaster", "Iron", "TV", "Bed linens", "Hot water kettle", "Smoke alarm", "Freezer", "Hot water", "Heating", "Oven", "Air conditioning", "Hair dryer", "Coffee", "Luggage dropoff allowed", "Dishwasher", "Coffee maker", "Dishes and silverware", "Microwave", "Patio or balcony", "Conditioner", "Private entrance", "Kitchen", "Refrigerator", "Body soap", "Paid parking off premises", "High chair", "Shampoo", "Stove", "Carbon monoxide alarm", "Washer", "Extra pillows and blankets", "Cooking basics", "Clothing storage", "Hangers"]']

price = [nan '$172.00 ' '$75.00 ' '$79.00 ' '$126.00 ' '$148.00 ' '$90.00 '
 '$163.00 ' '$50.00 ' '$309.00 ' '$66.00 ' '$129.00 ' '$84.00 ' '$250.00 '
 '$295.00 ' '$92.00 ' '$300.00 ' '$322.00 ' '$80.00 ' '$200.00 ' '$44.00 '
 '$60.00 ' '$280.00 ' '$100.00 ' '$99.00 ' '$288.00 ' '$361.00 '
 '$115.00 ' '$30.00 ' '$62.00 ' '$69.00 ' '$55.00 ' '$279.00 ' '$106.00 '
 '$110.00 ' '$108.00 ' '$399.00 ' '$97.00 ' '$324.00 ' '$65.00 '
 '$149.00 ' '$119.00 ' '$45.00 ' '$150.00 ' '$120.00 ' '$190.00 '
 '$83.00 ' '$95.00 ' '$180.00 ' '$500.00 ' '$116.00 ' '$145.00 '
 '$444.00 ' '$440.00 ' '$271.00 ' '$278.00 ' '$98.00 ' '$88.00 '
 '$1,000.00 ' '$87.00 ' '$196.00 ' '$475.00 ' '$470.00 ' '$350.00 '
 '$121.00 ' '$160.00 ' '$130.00 ' '$125.00 ' '$439.00 ' '$78.00 '
 '$225.00 ' '$255.00 ' '$222.00 ' '$186.00 ' '$77.00 ' '$275.00 '
 '$71.00 ' '$135.00 ' '$131.00 ' '$72.00 ' '$166.00 ' '$152.00 '
 '$140.00 ' '$214.00 ' '$156.00 ' '$101.00 ' '$91.00 ' '$168.00 '
 '$396.00 ' '$85.00 ' '$93.00 ' '$187.00 ' '$128.00 ' '$220.00 '
 '$249.00 ' '$59.00 ' '$449.00 ' '$170.00 ' '$499.00 ' '$330.00 '
 '$155.00 ' '$218.00 ' '$109.00 ' '$269.00 ' '$175.00 ' '$236.00 '
 '$265.00 ' '$53.00 ' '$146.00 ' '$153.00 ' '$86.00 ' '$173.00 ' '$81.00 '
 '$57.00 ' '$76.00 ' '$326.00 ' '$302.00 ' '$167.00 ' '$216.00 '
 '$162.00 ' '$105.00 ' '$259.00 ' '$185.00 ' '$157.00 ' '$888.00 '
 '$246.00 ' '$237.00 ' '$598.00 ' '$111.00 ' '$89.00 ' '$61.00 '
 '$268.00 ' '$230.00 ' '$370.00 ' '$349.00 ' '$143.00 ' '$550.00 '
 '$64.00 ' '$70.00 ' '$224.00 ' '$94.00 ' '$999.00 ' '$364.00 ' '$191.00 '
 '$138.00 ' '$179.00 ' '$264.00 ' '$124.00 ' '$67.00 ' '$400.00 '
 '$198.00 ' '$282.00 ' '$489.00 ' '$137.00 ' '$799.00 ' '$254.00 '
 '$258.00 ' '$338.00 ' '$63.00 ' '$852.00 ' '$313.00 ' '$54.00 '
 '$189.00 ' '$134.00 ' '$659.00 ' '$96.00 ' '$39.00 ' '$690.00 '
 '$341.00 ' '$182.00 ' '$895.00 ' '$375.00 ' '$229.00 ' '$445.00 '
 '$164.00 ' '$335.00 ' '$122.00 ' '$266.00 ' '$139.00 ' '$169.00 '
 '$383.00 ' '$42.00 ' '$58.00 ' '$900.00 ' '$112.00 ' '$68.00 ' '$133.00 '
 '$141.00 ' '$240.00 ' '$405.00 ' '$123.00 ' '$299.00 ' '$307.00 '
 '$118.00 ' '$48.00 ' '$213.00 ' '$773.00 ' '$104.00 ' '$298.00 '
 '$102.00 ' '$41.00 ' '$273.00 ' '$29.00 ' '$485.00 ' '$385.00 '
 '$352.00 ' '$40.00 ' '$147.00 ' '$286.00 ' '$233.00 ' '$434.00 '
 '$195.00 ' '$320.00 ' '$540.00 ' '$208.00 ' '$113.00 ' '$491.00 '
 '$245.00 ' '$82.00 ' '$221.00 ' '$136.00 ' '$241.00 ' '$270.00 '
 '$33.00 ' '$35.00 ' '$316.00 ' '$339.00 ' '$293.00 ' '$142.00 '
 '$248.00 ' '$450.00 ' '$424.00 ' '$580.00 ' '$205.00 ' '$199.00 '
 '$291.00 ' '$161.00 ' '$51.00 ' '$325.00 ' '$210.00 ' '$235.00 '
 '$398.00 ' '$890.00 ' '$403.00 ' '$165.00 ' '$127.00 ' '$171.00 '
 '$528.00 ' '$244.00 ' '$204.00 ' '$337.00 ' '$56.00 ' '$329.00 '
 '$151.00 ' '$38.00 ' '$289.00 ' '$215.00 ' '$599.00 ' '$262.00 '
 '$547.00 ' '$183.00 ' '$47.00 ' '$1,828.00 ' '$194.00 ' '$263.00 '
 '$52.00 ' '$586.00 ' '$549.00 ' '$46.00 ' '$277.00 ' '$260.00 '
 '$514.00 ' '$73.00 ' '$178.00 ' '$321.00 ' '$103.00 ' '$203.00 '
 '$1,451.00 ' '$297.00 ' '$411.00 ' '$379.00 ' '$281.00 ' '$829.00 '
 '$36.00 ' '$242.00 ' '$211.00 ' '$750.00 ' '$2,000.00 ' '$318.00 '
 '$74.00 ' '$590.00 ' '$914.00 ' '$421.00 ' '$5,000.00 ' '$219.00 '
 '$1,200.00 ' '$480.00 ' '$132.00 ' '$285.00 ' '$296.00 ' '$417.00 '
 '$414.00 ' '$529.00 ' '$456.00 ' '$251.00 ' '$239.00 ' '$395.00 '
 '$154.00 ' '$595.00 ' '$256.00 ' '$114.00 ' '$345.00 ' '$600.00 '
 '$380.00 ' '$49.00 ' '$238.00 ' '$247.00 ' '$3,500.00 ' '$428.00 '
 '$43.00 ' '$290.00 ' '$371.00 ' '$28.00 ' '$177.00 ' '$159.00 '
 '$212.00 ' '$10,000.00 ' '$234.00 ' '$158.00 ' '$328.00 ' '$538.00 '
 '$184.00 ' '$197.00 ' '$957.00 ' '$107.00 ' '$756.00 ' '$431.00 '
 '$721.00 ' '$176.00 ' '$2,026.00 ' '$856.00 ' '$231.00 ' '$202.00 '
 '$358.00 ' '$369.00 ' '$310.00 ' '$764.00 ' '$995.00 ' '$276.00 '
 '$315.00 ' '$181.00 ' '$37.00 ' '$629.00 ' '$886.00 ' '$201.00 '
 '$577.00 ' '$569.00 ' '$232.00 ' '$253.00 ' '$800.00 ' '$929.00 '
 '$243.00 ' '$9,999.00 ' '$206.00 ' '$553.00 ' '$303.00 ' '$217.00 '
 '$314.00 ' '$783.00 ' '$571.00 ' '$393.00 ' '$31.00 ' '$117.00 '
 '$207.00 ' '$579.00 ' '$1,895.00 ' '$34.00 ' '$636.00 ' '$344.00 '
 '$663.00 ' '$589.00 ' '$625.00 ' '$925.00 ' '$465.00 ' '$272.00 '
 '$228.00 ' '$493.00 ' '$536.00 ' '$188.00 ' '$407.00 ' '$427.00 '
 '$404.00 ' '$593.00 ' '$283.00 ' '$413.00 ' '$453.00 ' '$526.00 '
 '$546.00 ' '$466.00 ' '$515.00 ' '$679.00 ' '$343.00 ' '$591.00 '
 '$257.00 ' '$174.00 ' '$850.00 ' '$495.00 ' '$736.00 ' '$419.00 '
 '$420.00 ' '$306.00 ' '$647.00 ' '$356.00 ' '$394.00 ' '$144.00 '
 '$467.00 ' '$267.00 ' '$430.00 ' '$32.00 ' '$363.00 ' '$455.00 '
 '$1,656.00 ' '$323.00 ' '$418.00 ' '$412.00 ' '$426.00 ' '$710.00 '
 '$333.00 ' '$360.00 ' '$684.00 ' '$274.00 ' '$252.00 ' '$192.00 '
 '$209.00 ' '$877.00 ' '$1,827.00 ' '$772.00 ' '$359.00 ' '$950.00 '
 '$362.00 ' '$714.00 ' '$699.00 ' '$785.00 ' '$15.00 ' '$457.00 '
 '$12,400.00 ' '$649.00 ' '$226.00 ' '$389.00 ' '$376.00 ' '$700.00 '
 '$429.00 ' '$406.00 ' '$284.00 ' '$1,085.00 ' '$336.00 ' '$655.00 '
 '$384.00 ' '$386.00 ' '$292.00 ' '$937.00 ' '$3,570.00 ' '$193.00 '
 '$342.00 ' '$223.00 ' '$261.00 ' '$608.00 ' '$425.00 ' '$438.00 '
 '$650.00 ' '$319.00 ' '$3,000.00 ' '$27.00 ' '$304.00 ' '$739.00 '
 '$760.00 ' '$2,050.00 ' '$1,151.00 ' '$459.00 ' '$433.00 ' '$19.00 '
 '$3.00 ' '$1,999.00 ' '$693.00 ' '$473.00 ' '$354.00 ' '$423.00 '
 '$340.00 ' '$317.00 ' '$401.00 ' '$25.00 ' '$585.00 ' '$509.00 '
 '$22.00 ' '$920.00 ' '$507.00 ' '$442.00 ' '$1,351.00 ' '$441.00 '
 '$451.00 ' '$698.00 ' '$725.00 ' '$381.00 ' '$357.00 ' '$656.00 '
 '$792.00 ' '$365.00 ' '$1,500.00 ' '$334.00 ' '$347.00 ' '$504.00 '
 '$227.00 ' '$287.00 ' '$377.00 ' '$657.00 ' '$368.00 ' '$539.00 '
 '$1,400.00 ' '$1,295.00 ' '$355.00 ' '$415.00 ' '$372.00 ' '$382.00 '
 '$443.00 ' '$820.00 ' '$20.00 ' '$986.00 ' '$402.00 ' '$374.00 '
 '$2,150.00 ' '$331.00 ' '$479.00 ' '$1,199.00 ' '$408.00 ' '$745.00 '
 '$825.00 ' '$447.00 ' '$378.00 ' '$564.00 ' '$594.00 ' '$749.00 '
 '$662.00 ' '$711.00 ' '$532.00 ' '$1,542.00 ' '$1,165.00 ' '$305.00 '
 '$471.00 ' '$543.00 ' '$478.00 ' '$689.00 ' '$410.00 ' '$1,795.00 '
 '$484.00 ' '$849.00 ' '$542.00 ' '$294.00 ' '$611.00 ' '$541.00 '
 '$332.00 ' '$446.00 ' '$522.00 ' '$1,171.00 ' '$555.00 ' '$498.00 '
 '$587.00 ' '$530.00 ' '$592.00 ' '$26.00 ' '$628.00 ' '$842.00 '
 '$523.00 ' '$448.00 ' '$346.00 ' '$943.00 ' '$915.00 ' '$21.00 '
 '$390.00 ' '$14.00 ' '$734.00 ' '$614.00 ' '$642.00 ' '$694.00 '
 '$581.00 ' '$841.00 ' '$460.00 ' '$748.00 ' '$391.00 ' '$605.00 '
 '$437.00 ' '$18.00 ' '$436.00 ' '$1,160.00 ' '$557.00 ' '$327.00 '
 '$494.00 ' '$678.00 ' '$521.00 ' '$621.00 ' '$828.00 ' '$533.00 '
 '$519.00 ' '$819.00 ' '$676.00 ' '$827.00 ' '$845.00 ' '$1,850.00 '
 '$476.00 ' '$397.00 ' '$560.00 ' '$373.00 ' '$311.00 ' '$308.00 '
 '$818.00 ' '$524.00 ' '$2,500.00 ' '$301.00 ' '$351.00 ' '$746.00 '
 '$607.00 ' '$795.00 ' '$839.00 ' '$791.00 ' '$388.00 ' '$1,365.00 '
 '$837.00 ' '$789.00 ' '$545.00 ' '$612.00 ' '$588.00 ' '$416.00 '
 '$367.00 ' '$4,000.00 ' '$481.00 ' '$312.00 ' '$469.00 ' '$948.00 '
 '$762.00 ' '$692.00 ' '$4,500.00 ' '$686.00 ' '$387.00 ' '$670.00 '
 '$1,335.00 ' '$1,183.00 ' '$574.00 ' '$604.00 ' '$1,515.00 ' '$1,450.00 '
 '$664.00 ' '$742.00 ' '$1,439.00 ' '$624.00 ' '$525.00 ' '$671.00 '
 '$666.00 ' '$606.00 ' '$511.00 ' '$353.00 ' '$483.00 ' '$366.00 '
 '$1,057.00 ' '$981.00 ' '$462.00 ' '$568.00 ' '$464.00 ' '$510.00 '
 '$794.00 ' '$730.00 ' '$8,002.00 ' '$775.00 ' '$905.00 ' '$761.00 '
 '$558.00 ' '$1,415.00 ' '$646.00 ' '$573.00 ' '$1,014.00 ' '$1,750.00 '
 '$575.00 ' '$674.00 ' '$501.00 ' '$1,185.00 ' '$899.00 ' '$497.00 '
 '$554.00 ' '$4,119.00 ' '$1,430.00 ' '$781.00 ' '$1,603.00 ' '$853.00 '
 '$1,040.00 ' '$616.00 ' '$731.00 ' '$1,100.00 ' '$1,426.00 ' '$615.00 '
 '$1,143.00 ' '$1,600.00 ' '$409.00 ' '$1,300.00 ' '$720.00 ' '$776.00 '
 '$584.00 ' '$840.00 ' '$613.00 ' '$552.00 ' '$643.00 ' '$988.00 '
 '$1,520.00 ' '$713.00 ' '$392.00 ' '$435.00 ' '$1,177.00 ' '$348.00 '
 '$1,279.00 ' '$520.00 ' '$871.00 ' '$518.00 ' '$562.00 ' '$946.00 '
 '$637.00 ' '$610.00 ' '$665.00 ' '$4,310.00 ' '$24.00 ' '$422.00 '
 '$717.00 ' '$548.00 ' '$959.00 ' '$630.00 ' '$924.00 ' '$486.00 '
 '$640.00 ' '$488.00 ' '$1,252.00 ' '$474.00 ' '$751.00 ' '$534.00 '
 '$517.00 ' '$998.00 ' '$487.00 ' '$8,000.00 ' '$513.00 ' '$790.00 '
 '$780.00 ' '$490.00 ' '$8,820.00 ' '$1,384.00 ' '$1,352.00 ' '$2,200.00 '
 '$477.00 ' '$960.00 ' '$620.00 ' '$744.00 ' '$722.00 ' '$458.00 '
 '$836.00 ' '$1,121.00 ' '$738.00 ' '$461.00 ' '$970.00 ' '$516.00 '
 '$857.00 ' '$506.00 ' '$875.00 ' '$1,757.00 ' '$556.00 ' '$945.00 '
 '$705.00 ' '$983.00 ' '$609.00 ' '$576.00 ' '$787.00 ' '$672.00 '
 '$661.00 ' '$535.00 ' '$1,220.00 ' '$1,770.00 ' '$578.00 ' '$454.00 '
 '$851.00 ' '$527.00 ' '$1,343.00 ' '$531.00 ' '$570.00 ' '$12.00 '
 '$432.00 ' '$502.00 ' '$811.00 ' '$468.00 ' '$864.00 ' '$846.00 '
 '$561.00 ' '$2,805.00 ' '$866.00 ' '$2,650.00 ' '$916.00 ' '$23.00 '
 '$1,350.00 ' '$685.00 ' '$1,465.00 ' '$1,059.00 ' '$1,114.00 ' '$902.00 '
 '$796.00 ' '$833.00 ']

minimum_nights = [  28  180   90  750   91  120  150    3   85   31   18    2   29   30
    5    1  183   10    4  365   80  200   60    7   13  100   14   32
  700   21   12   40  240   56   45  210    6   89 1124  300  185  250
  366   88   62 1000   84 1125  119   74    8  135   20   22  730  360
   57   75   99  175  181  179  140  184  168  160    9  500   65  299
   92   35   50  333  110  450   59   58  182  359  239  128  137  124
  375   55  121  220  114  130   47   15   42  364   49 1120  600 1100
  153   44  170  358   64  155  228   33  270   25  174  330   34]

maximum_nights = [  730   365  1125    90  1100   125    30   180   400    31   500    60
   162   182   150  1124    50   270   112   366  1000     7   120    99
    28   100    14   160   250   460   999   108    19   700    35   360
   130    15    21    64    56    10   190     3   300    48    45    62
   179     4   260    27   600   200    17    20   364  1123    80  3000
    32    33   888    52   285     9    40    93    29  1095   900     2
   102    75    61   601     5   122    72   352    38   650    42    95
   555    16    91    22   240    12    85   135    71  1114  1111    88
    87    36   731   121  2000   550    70    65    25  3650   450    92
   375   185   665   356   181   729   222    67   777   187    26   800
   829    55   399    96   210   139   395     6    53   168    73   186
    58   101   380    44    46   355 10001   666   362   720   350    68
    89    13   115  1121   123   979   105   214   340   165    59   220
    41   175   280   265    11   183   342     8    18    34    51   110
   140    84   320    74   193    63    49   114   201   225   725   590
    39   184    79    94   236    69   124    37   178   106   170   367
   330    23   290  1120   325   188   104   245    24   465   152   128
   111   368    66   393   281   370   275     1   161   149   145   155
   230   279   118   138   117   369   176   420    97   336   239   358
   326   335   299   127   189   361   363   346    98   307   177    83
   195]

minimum_minimum_nights = [  28  180   90  750   91  120  150    1   60   10  185   18    2   29
   30    5  183   31    3    4  365   80   32    7   13  100   14  700
   21   12   40  240   56  210    6   89   36   45 1124  300  250  366
   88   62 1000   84  500 1125  119   74    8  135   22  730   92   50
  360   57   75   99  175  181  179  140  184  168   65  178   35   85
  333  110  350   59   19   53  160    9  200   58  182  359  103  239
  128  137   42  124  375   55  121  220  114  130   47  280   15   49
   20   16 1120  600 1100  153   44  170  358   64  155  228   33  270
  364   25  174  330   34]

maximum_minimum_nights = [  28  180   90  750   91  120  150    4   85  185   18    2   29   30
    5  183   31   10    3    1  365   80  200   60    7   13  100   14
   32  700   21   12   40  240   56    6  210   89  133   45   55 1124
  300  250  366   88   62 1000   84  552 1125  119   74    8  135   22
  730   50  360   57   75   99  175  181  179   58  140  184  168  349
    9   95  500   65  178   92   35  333  110   59  160  106  182  359
  280  239  128  137  124  375  121  220  114  130   20   47  235   61
  290   15   42  364   49  222 1120  600  225 1100  153   44  170  358
   27   64  155  228   33   39  270   69   25  174  330  350   16   34
   24]

minimum_maximum_nights = [  730   365  1125    90  1100   125    30   180   400    31   500    60
   162   182   150  1124    50   270   112   366  1000     7   120    99
    28   100    14   160     4   250   460   999   108    19   700    35
   360   130    21    64    56    10   190     3   300    48    62   179
   260    27   600   200    17    20  3000    32    45    33   888    52
   285     9    40    93    29  1095   900     2   102    75    61   601
     5   364    72   352    38   650    42    95   555    16    91   240
    12    85   135    71  1114    88    87    36   731   121  2000   550
    70   122    65    25  3650   450    92    80   375   185    15   356
   181    67   187    26   800   829    55   399    96   210   139     1
   395     6   168    73   186    58   380    44    46 10001   666   362
   350    68    89    13  1123   115    53   979   105   720   214   340
    59   220    41   280   265   183   355   342     8    18    34    51
   110   140   320    74   193    63    11    84   201   225   241   590
    39   184   725    94   236   124    37   106   170   367   330    23
  1120   290   188    79   175   465   128   111   368   152   393   281
   370   161   149    24   145    66   155   729   230   279   118   138
   245   117   159   369   176   420    97   239    22   358   326   335
   165   123   127   189   361   363   346    98   307   177    83   178
   195]

maximum_maximum_nights = [  730   365  1125    90  1100   125    30   180   400    31   500    60
   162   182   150  1124    50   270   112   366  1000     7   120    99
    28   100    14   160   250   460   999   108    19   700    35   360
   130    21    64    56    10   190     3   300    48    62   179     4
   260    27   600   200    17    20  3000    32    45    33   888    52
   285     9    40    93    29  1095   900     2   102    75    61   601
     5   364    72   352    38   650    42    95   555    16    91   240
    12    85   135    71  1114    88    87    36   731   121  2000   550
    70   122    65    25  3650   450    92    80   375   185    15   356
   181    67   187    26   800   829    55   399    96   210   139   395
     6   168    73   186    58   380    44    46 10001   666   362   350
    68    89    13  1123   115   979   105   720   214   340    59   220
    41   280   265   183   355   342     8    18    34    51   110   140
   320    74   193    63    11    84   201   225   590    39   184   725
    94   236   124    37   106   170   367   330    23  1120   290   188
    79   175   465   128   111   368   152   393   281   370     1   161
   149    24   145    66   155   729   230   279   118   138   245   117
   369   176   420    97   239    22   358   326   335   165   123   127
   189   361   363   346    98   307   177    83   178   195]

minimum_nights_avg_ntm = [2.800e+01 1.800e+02 9.000e+01 7.500e+02 9.100e+01 1.200e+02 1.500e+02
 3.900e+00 6.140e+01 2.760e+01 1.850e+02 1.800e+01 2.000e+00 2.900e+01
 3.000e+01 5.000e+00 1.880e+01 1.830e+02 3.100e+01 3.300e+00 9.900e+00
 3.000e+00 2.830e+01 1.000e+00 4.000e+00 1.350e+01 6.200e+00 3.650e+02
 8.000e+01 1.230e+01 1.669e+02 6.000e+01 2.850e+01 7.000e+00 1.300e+01
 1.000e+02 1.400e+01 3.200e+01 7.000e+02 1.100e+00 2.900e+00 2.100e+01
 1.200e+01 1.640e+01 4.000e+01 2.400e+02 9.300e+00 1.400e+00 2.770e+01
 5.600e+01 5.840e+01 5.100e+00 2.100e+02 8.410e+01 6.000e+00 2.890e+01
 8.900e+01 1.000e+01 1.500e+00 1.247e+02 1.300e+00 4.300e+00 1.280e+01
 1.590e+01 1.960e+01 4.500e+01 3.500e+01 1.124e+03 3.000e+02 2.910e+01
 2.300e+00 2.500e+02 3.660e+02 2.600e+00 2.810e+01 8.800e+01 6.200e+01
 6.300e+00 4.800e+00 6.900e+00 2.870e+01 1.000e+03 4.900e+00 6.930e+01
 8.400e+01 5.433e+02 1.125e+03 2.100e+00 1.190e+02 7.400e+01 1.210e+01
 1.688e+02 8.420e+01 8.000e+00 1.350e+02 2.200e+01 7.300e+02 4.710e+01
 2.400e+00 1.148e+02 5.000e+01 8.440e+01 3.600e+02 5.700e+01 8.630e+01
 1.700e+00 7.500e+01 9.900e+01 1.750e+02 1.810e+02 4.280e+01 2.720e+01
 3.040e+01 5.300e+01 2.180e+01 2.790e+01 2.750e+01 2.700e+00 1.790e+02
 5.750e+01 1.290e+01 3.600e+00 1.340e+01 5.140e+01 3.430e+01 1.450e+01
 2.560e+01 5.670e+01 1.400e+02 5.760e+01 5.500e+00 1.840e+02 1.680e+02
 1.410e+01 3.442e+02 5.720e+01 7.080e+01 9.000e+00 4.940e+01 5.000e+02
 1.780e+01 4.100e+00 5.430e+01 7.610e+01 4.090e+01 7.700e+00 8.540e+01
 6.500e+01 4.400e+00 2.690e+01 1.780e+02 3.200e+00 9.200e+01 4.670e+01
 4.200e+00 1.760e+01 1.900e+00 5.200e+00 9.560e+01 2.330e+01 2.000e+01
 8.500e+01 8.900e+00 1.860e+01 3.290e+01 3.330e+02 1.010e+01 5.730e+01
 1.200e+00 6.500e+00 5.540e+01 1.100e+02 1.490e+01 1.650e+01 2.950e+01
 3.629e+02 5.900e+01 2.990e+01 5.910e+01 5.440e+01 5.330e+01 6.900e+01
 8.600e+01 3.350e+01 2.980e+01 2.240e+01 1.600e+02 2.200e+00 4.730e+01
 3.090e+01 8.700e+00 2.840e+01 2.860e+01 2.914e+02 3.400e+00 3.070e+01
 1.430e+01 2.660e+01 1.970e+01 9.100e+00 3.020e+01 6.990e+01 3.800e+01
 1.380e+01 5.800e+00 2.800e+00 8.600e+00 5.740e+01 3.080e+01 1.800e+00
 5.550e+01 5.800e+01 4.250e+01 2.000e+02 3.710e+01 5.400e+00 1.820e+02
 6.400e+00 3.590e+02 3.100e+00 2.700e+01 9.400e+00 5.780e+01 1.578e+02
 2.329e+02 2.390e+02 1.280e+02 2.390e+01 2.970e+01 1.370e+02 2.280e+01
 3.330e+01 4.600e+01 2.540e+01 3.590e+01 3.210e+01 1.092e+02 8.510e+01
 7.200e+00 1.240e+02 5.600e+00 6.760e+01 3.750e+02 3.010e+01 5.640e+01
 5.500e+01 1.210e+02 8.460e+01 4.540e+01 2.200e+02 1.140e+02 4.410e+01
 8.230e+01 4.330e+01 5.690e+01 1.300e+02 1.065e+02 1.540e+01 2.820e+01
 2.380e+01 1.070e+01 4.360e+01 1.140e+01 5.850e+01 3.060e+01 8.090e+01
 4.700e+01 1.070e+02 1.955e+02 6.600e+00 2.590e+01 4.240e+01 3.860e+01
 2.450e+01 2.270e+01 3.260e+01 4.700e+00 6.330e+01 1.160e+01 9.740e+01
 1.910e+01 1.470e+01 1.250e+01 2.800e+02 7.220e+01 2.738e+02 4.600e+00
 3.900e+01 1.500e+01 8.300e+01 4.200e+01 8.110e+01 2.696e+02 5.490e+01
 9.180e+01 2.930e+01 1.038e+02 4.900e+01 2.564e+02 7.320e+01 1.950e+01
 5.520e+01 1.600e+00 1.870e+01 3.120e+01 4.920e+01 1.055e+02 8.010e+01
 8.860e+01 4.390e+01 2.670e+01 9.980e+01 1.025e+02 1.110e+01 8.200e+00
 3.560e+01 1.120e+03 1.090e+01 1.318e+02 2.550e+01 9.020e+01 5.700e+00
 1.040e+01 4.210e+01 2.250e+01 2.090e+01 8.570e+01 1.659e+02 4.500e+00
 2.960e+01 3.370e+01 6.000e+02 8.450e+01 6.100e+00 1.120e+01 8.870e+01
 2.440e+01 1.077e+02 5.810e+01 4.760e+01 1.683e+02 2.740e+01 1.720e+01
 1.100e+03 1.600e+01 7.800e+00 1.180e+01 1.530e+02 1.320e+01 4.400e+01
 1.560e+01 1.050e+01 1.129e+02 2.780e+01 2.060e+01 5.680e+01 8.100e+00
 3.441e+02 3.700e+00 2.230e+01 7.100e+00 1.310e+01 1.480e+01 9.720e+01
 7.740e+01 2.340e+01 1.750e+01 9.800e+00 1.240e+01 1.020e+01 5.410e+01
 1.170e+01 1.700e+02 1.580e+01 1.530e+01 1.550e+01 1.150e+01 3.580e+02
 2.580e+01 4.470e+01 4.780e+01 4.260e+01 6.400e+01 5.300e+00 1.550e+02
 1.461e+02 4.720e+01 2.420e+01 3.800e+00 4.490e+01 2.370e+01 2.500e+00
 2.220e+01 3.570e+01 1.796e+02 8.800e+00 4.310e+01 2.360e+01 1.060e+01
 2.610e+01 2.430e+01 2.280e+02 2.050e+01 2.310e+01 2.410e+01 5.900e+00
 2.210e+01 2.570e+01 2.630e+01 6.800e+00 7.300e+00 3.140e+01 1.710e+01
 3.970e+01 4.440e+01 2.110e+01 2.029e+02 1.455e+02 2.020e+01 3.300e+01
 1.030e+01 2.400e+01 1.680e+01 1.390e+01 1.460e+01 1.138e+02 1.330e+01
 1.775e+02 1.360e+01 1.420e+01 3.500e+00 3.700e+01 1.440e+01 1.724e+02
 9.600e+00 2.880e+01 8.300e+00 3.640e+01 2.700e+02 3.220e+01 1.740e+01
 1.510e+01 3.550e+01 2.510e+01 8.960e+01 2.620e+01 3.050e+01 1.990e+01
 4.380e+01 2.470e+01 9.700e+00 5.660e+01 5.570e+01 7.490e+01 1.049e+02
 1.700e+01 1.162e+02 7.810e+01 9.730e+01 2.650e+01 2.520e+01 6.550e+01
 1.820e+01 5.310e+01 1.520e+01 1.080e+01 1.190e+01 3.110e+01 1.670e+01
 3.470e+01 3.630e+01 2.500e+01 8.520e+01 2.730e+01 1.689e+02 2.300e+01
 9.450e+01 1.810e+01 1.980e+01 7.300e+01 3.640e+02 4.320e+01 5.880e+01
 1.260e+01 3.540e+01 2.920e+01 1.100e+01 1.655e+02 5.150e+01 7.500e+00
 7.800e+01 1.639e+02 3.280e+01 1.610e+01 3.030e+01 2.160e+01 1.740e+02
 3.840e+01 6.120e+01 3.395e+02 1.220e+01 3.300e+02 1.840e+01 3.620e+01
 2.030e+01 2.120e+01 2.680e+01 1.690e+01 1.543e+02 2.040e+01 5.590e+01
 8.250e+01 1.327e+02 3.410e+01 3.334e+02 1.270e+01 1.930e+01 2.130e+01
 4.160e+01 8.590e+01 2.290e+01 2.070e+01 2.140e+01 3.940e+01 9.570e+01
 2.260e+01 1.067e+02 4.230e+01 8.480e+01 7.880e+01 3.400e+01 6.700e+00
 9.430e+01]

maximum_nights_avg_ntm = [7.3000e+02 3.6500e+02 1.1250e+03 9.0000e+01 1.1000e+03 1.2500e+02
 3.0000e+01 1.8000e+02 4.0000e+02 3.1000e+01 5.0000e+02 6.0000e+01
 1.6200e+02 1.8200e+02 1.5000e+02 1.1240e+03 5.0000e+01 2.7000e+02
 1.1200e+02 3.6600e+02 1.0000e+03 7.0000e+00 1.2000e+02 9.9000e+01
 2.8000e+01 1.0000e+02 1.4000e+01 1.6000e+02 2.6700e+01 7.8850e+02
 2.5000e+02 4.6000e+02 9.9900e+02 1.0800e+02 1.9000e+01 7.0000e+02
 3.5000e+01 3.6000e+02 1.3000e+02 1.0264e+03 2.1000e+01 6.4000e+01
 6.2590e+02 5.6000e+01 1.0000e+01 1.9000e+02 3.0000e+00 3.0000e+02
 4.8000e+01 6.2000e+01 1.7900e+02 4.0000e+00 2.6000e+02 2.7000e+01
 6.0000e+02 2.0000e+02 1.7000e+01 2.0000e+01 3.0000e+03 3.2000e+01
 4.5000e+01 3.3000e+01 8.8800e+02 5.2000e+01 2.8500e+02 9.0000e+00
 4.0000e+01 9.3000e+01 2.9000e+01 1.0950e+03 9.0000e+02 2.0000e+00
 1.0200e+02 7.5000e+01 6.1000e+01 6.0100e+02 5.0000e+00 3.6400e+02
 7.2000e+01 3.5200e+02 3.8000e+01 6.5000e+02 4.2000e+01 9.5000e+01
 5.5500e+02 1.6000e+01 9.1000e+01 1.1172e+03 2.4000e+02 1.2000e+01
 8.5000e+01 1.3500e+02 7.1000e+01 1.1140e+03 8.8000e+01 8.7000e+01
 3.6000e+01 7.3100e+02 9.8840e+02 1.2100e+02 2.0000e+03 5.5000e+02
 7.0000e+01 1.2200e+02 6.5000e+01 2.5000e+01 3.6500e+03 4.5000e+02
 9.2000e+01 8.0000e+01 3.7500e+02 3.3140e+02 1.8500e+02 1.5000e+01
 3.5600e+02 1.8100e+02 6.7000e+01 3.9830e+02 1.8700e+02 2.6000e+01
 8.0000e+02 1.6680e+02 8.2900e+02 5.5000e+01 3.9900e+02 9.6000e+01
 2.1000e+02 1.3900e+02 1.0675e+03 7.9080e+02 6.5490e+02 3.9500e+02
 6.0000e+00 1.6800e+02 7.3000e+01 6.3160e+02 1.8600e+02 5.8000e+01
 1.0966e+03 3.8000e+02 6.5280e+02 6.2970e+02 4.4000e+01 4.6000e+01
 1.0001e+04 2.4610e+02 6.6600e+02 1.0970e+03 3.6200e+02 8.1150e+02
 3.5000e+02 6.8000e+01 8.9000e+01 9.4100e+02 1.3000e+01 1.1230e+03
 1.1500e+02 5.4400e+01 9.7900e+02 1.0500e+02 1.1173e+03 7.2000e+02
 2.1400e+02 3.4000e+02 5.9000e+01 2.2000e+02 4.1000e+01 2.8000e+02
 2.6500e+02 4.4900e+01 1.8300e+02 3.5500e+02 3.4200e+02 8.0000e+00
 5.3660e+02 1.8000e+01 1.0767e+03 3.1950e+02 3.4000e+01 5.1000e+01
 1.1000e+02 1.4000e+02 2.6100e+01 3.2000e+02 7.4000e+01 6.8470e+02
 1.5670e+02 1.9300e+02 6.3000e+01 1.1000e+01 5.6640e+02 1.1120e+03
 8.4000e+01 2.0100e+02 2.2500e+02 3.4640e+02 2.8070e+02 2.9380e+02
 3.1830e+02 2.0600e+02 8.5460e+02 8.1380e+02 5.4450e+02 5.9000e+02
 3.9000e+01 1.8400e+02 7.2500e+02 3.5710e+02 1.6100e+02 5.2110e+02
 9.6020e+02 8.8060e+02 9.1100e+02 9.4000e+01 1.6650e+02 1.5960e+02
 1.5890e+02 4.6290e+02 3.8130e+02 1.6450e+02 2.1930e+02 2.9940e+02
 2.3600e+02 8.8890e+02 3.0700e+02 1.4070e+02 1.5600e+02 1.9180e+02
 8.6030e+02 1.1199e+03 8.1330e+02 1.1174e+03 1.2400e+02 3.7000e+01
 1.0600e+02 1.7000e+02 3.6700e+02 1.0547e+03 1.6560e+02 3.3000e+02
 2.3000e+01 1.1200e+03 2.9000e+02 2.9030e+02 1.0945e+03 3.4290e+02
 3.3970e+02 7.6430e+02 4.6940e+02 1.8800e+02 1.0996e+03 1.9240e+02
 7.9000e+02 7.9000e+01 6.9600e+02 1.3500e+01 7.6780e+02 7.7480e+02
 4.1750e+02 6.3290e+02 1.5600e+01 4.8230e+02 1.7500e+02 4.6500e+02
 1.2800e+02 1.1100e+02 1.4250e+02 9.6260e+02 2.9500e+01 3.6800e+02
 1.5200e+02 3.9300e+02 1.1100e+01 2.8100e+02 3.7000e+02 1.0000e+00
 1.4020e+02 1.4900e+02 1.5300e+02 4.2280e+02 7.9490e+02 5.9260e+02
 5.9930e+02 4.9110e+02 2.4000e+01 9.2380e+02 6.6620e+02 6.7320e+02
 7.8350e+02 7.4750e+02 3.8260e+02 1.0921e+03 1.8300e+01 1.4500e+02
 6.6000e+01 2.9590e+02 1.5500e+02 7.2900e+02 2.3000e+02 2.9970e+02
 1.1232e+03 9.3030e+02 1.1054e+03 5.6400e+01 2.7900e+02 1.8870e+02
 1.1800e+02 1.7250e+02 1.3800e+02 2.4500e+02 1.1700e+02 2.7300e+01
 2.6080e+02 1.0422e+03 2.9200e+02 2.4120e+02 2.5980e+02 2.4570e+02
 2.6340e+02 2.2620e+02 6.5920e+02 8.1540e+02 5.6370e+02 1.2020e+02
 9.7910e+02 2.7530e+02 3.6900e+02 1.8610e+02 3.5670e+02 1.7600e+02
 4.2000e+02 8.4660e+02 6.9250e+02 1.0397e+03 9.7000e+01 2.7800e+01
 1.1245e+03 3.5690e+02 3.6340e+02 1.0640e+02 3.2850e+02 2.3900e+02
 2.8600e+01 1.0555e+03 1.1198e+03 2.0320e+02 2.2000e+01 3.5800e+02
 1.0718e+03 3.2600e+02 3.3500e+02 6.1190e+02 1.6500e+02 3.6250e+02
 3.8250e+02 2.0520e+02 1.2300e+02 9.3350e+02 4.7470e+02 1.2700e+02
 1.7310e+02 1.1220e+02 1.8900e+02 3.0450e+02 3.6100e+02 3.6190e+02
 3.3070e+02 9.9030e+02 9.9700e+02 3.8600e+01 1.0864e+03 7.4150e+02
 3.1380e+02 2.7310e+02 3.6300e+02 3.4600e+02 9.8000e+01 4.6870e+02
 3.8300e+01 9.3170e+02 3.1310e+02 7.2130e+02 1.1149e+03 1.0900e+01
 1.6840e+02 1.6470e+02 1.1244e+03 6.9420e+02 3.6850e+02 1.7700e+02
 5.8360e+02 1.6490e+02 8.8680e+02 5.8740e+02 8.3000e+01 5.1380e+02
 1.7800e+02 1.9500e+02 3.9460e+02]

has_availability = ['t' nan 'f']

availability_30 = [ 0 29  4  1 10  3  8 28  5 23 14 17 30 16  6 11  9  7 24  2 12 13 20 25
 22 15 19 21 18 26 27]

availability_60 = [ 0 13 59 34 31 10  3 18 16 58 21  7 35 20 55 26 53  1  5 44 24 17 60 33
 47  6 36 11  4 38 40 56 14 41 27 23 28 39 37 12 32 43 25 45 42  2 29 30
 46 50 19 51 15 49  9  8 52 48 54 57 22]

availability_90 = [ 0 16 89 57  3 61  4 10 48 46 88 14 51 37 64 38 55 56 83  1 35 74 54 45
 90 52 63 77  6 66 33 13 31 30 68 70  9 44 71  5 53 58 69 67 50 65 28 62
 41 34  7 43 72  2 27 32 12 40 76 59 80 85 17 42 47 26 11 60 18 21  8 86
 73 36 49 20 84 79 23 22 75 24 15 87 19 81 29 39 78 25 82]

availability_365 = [  0  74 364  57 278 336 248 279 123 259 218  36 323 247  29 229  72 321
 363  97 239 141 217   4  64 313 206 236 246 358   1 310 216 349 189  88
 144 134  90 151 157 179 135 312 365 142 338  77 174 257 341 187 326 260
  79 138  89 154 328  65  10  91  30 158  70  99  56 319 345  54  94 199
 128 273 333 249 342  66 230 340 258  69 208 233  41 339 215 309 110   6
  43 102  35 129 184   2 280  83 315  14 303 300 293 318  27 307 178   5
 124  19 140 287  87 166 334 145 133 139 355 146 262 175 205 108 353 238
 198 268 117 329 201 305  11  71 173  51 308 291 220 152 241 266  60 155
 264 222  46  20 237  21 296 165  50 225 331  33  23  39 149 191 335 193
 180 219 348 234  92 324 119 120 214  47 317 344 232 320 181 286  61 105
 244 106 347 298 112  48 116 245 242  93  80  75 161 306  98 301 290 167
 118  59 322 346 207 243  53  67  63 325  13 107 164 160 159 359  31  40
 131 316 332  95 143  42  24  26 253 251  28 115 351  49 270 263 122 277
 281 169 274 337 125 267  81 231 137  68 255 362 224  12 188 182 170 100
 190 171 304 111 221 356 132 183  62  15  32 177 275 136 265  17  34 361
 127 272 114 299 223 352 252 289  84 210 302  37 228 147 354 212 185  22
 150   9 285 357 172 256 261  38  52 156 148  25  86  73 282 104 227 162
 311 330   3 103 200  16 121 168   7  18 204 196  44 295 271 360  58 250
  85   8 269 292 197 283 211 288 254 130 276 101 203  96 153  45 195  55
  78  76 163 186 226  82 284 314 240 327 176 113 294 235 109 343 213 350
 194 202 126 297 192 209]

number_of_reviews = [   6  169   42   30    1  113    8   67   89   23   61   24   18    4
  162   12   84   11    0   63   53  129   56   22   15   39   76    5
   38   45  103   54   66   37  122   85   65   86    9  829   21   43
  126   47   34   74  101  115  185   79  188   10    2   29   87    7
    3   20   27   14   75  211  532  148   91  605   40  136   13  331
   32   35   73  238   16  170   51   44  152  688  516  100  120   52
  112   50  133  137   64   55   60   26  116   33   19   49  219  199
   59   41  425  110  128  470  445  127   82  172   68   70  125  194
   81  151  173  222  613  269   95  248   80   90  593  244  431  105
  182   17  106  559   78   98  227  327   93   57   28  376   36   25
   71  180  191  161  393  108  243  131  175  489  203  228  119  533
  384  252  141  121   48  209   96  268  285  159   31  812  158   58
   88   69  265   97  149   46   83  305  279   92   72  147  592  118
   62  163  367  167  171  183  155  111  258  251   77  525  379  145
  482  232  311  213  292  201  504  304  314  349  340  179  234  380
  332  166  160  146  102  140  202  193  259  192  456  132  286  436
  181  382  368  178  139  174  464  164  507  226  154   94  218  190
  316  427  107  282  177  348  403  449  187  198  245  109  215  143
  255  329  135  104  271  229  189  354  246  224  323  184  372  247
  334  208  168  365  144  197   99  157  299  242  411  230  280  270
  210  424  250  124  176  418  114  404  573  261  260  256  196  257
  223  335  447  214  117  150  277  457  297  123  134  337  322  267
  317  212  221  142  156  306  231  676  399  138  339  130  296  439
  333  235  220  387  338  266  324  233  350  352  715  308  274  254
  524 1116  272  300  357  846  153  195  186  326  343  385  452  298
  240  273  346  264  313  206  344  294  426  390  310  275  690  494
  569  241  216  281  204  307  468  315  289  405  600  200  353  303
  432  361  239  378  342  607  541  586  165  459  263  395  448  225
  249  413  287  394  205  325  392  347  236  441  237  415  375  589
  542  531  278  443  356  421  309  207  359  253  301  276]

number_of_reviews_ltm = [  0   2   1   4   3  13  10   7  16  47   5   6  22  70  18 115  27  48
  19  64  28  29  12   9   8  11  30 113  15  57  52  50  60  20  82  44
  14  33  43  81  49  56  34  35  21  23  24  26  39  53  17  45  38  66
  46  40  69  62  32  37  25  41  72  36  59  67  80  31  54  63  79  89
  61  42 109  73 140  55  71  51  98 116 101  96  75  58 126 136  77  85
  65  90  76  86  93 176 103 122 118  94  91 112  78  68  99 108  74 114
  92 106  84  83 100  97  88 110 123 129 117 142 120]

number_of_reviews_l30d = [ 0  1  4  3  8  2  5  6  7 10 11  9 14 12 18 15 13 16]

first_review = ['7/19/2015' '8/20/2009' '1/5/2011' ... '8/21/2024' '9/5/2024' '8/14/2024']

last_review = ['8/7/2017' '8/27/2013' '9/1/2023' ... '12/14/2023' '1/30/2024'
 '2/26/2024']

review_scores_rating = [5.   4.84 4.79 4.93 4.64 4.75 4.18 4.17 4.42 4.63 4.88 4.85 4.61 4.83
 4.94 4.92 4.82 4.55 4.71  nan 4.7  4.95 4.76 4.69 4.8  4.12 4.97 4.5
 4.62 4.89 4.21 4.52 4.74 4.77 3.8  4.66 4.87 4.34 4.38 4.86 4.67 4.81
 4.22 4.   4.9  4.57 4.47 4.25 4.91 4.73 4.45 4.72 4.96 4.43 4.33 4.6
 4.54 4.27 4.59 4.98 4.29 4.56 4.44 4.78 4.58 4.46 4.53 4.4  4.65 4.41
 4.99 4.51 4.15 4.68 3.   4.06 4.39 4.48 4.49 4.2  3.2  4.07 3.5  4.31
 3.75 4.14 4.26 4.19 4.16 4.37 4.36 2.   4.13 4.24 4.11 4.05 3.43 4.23
 3.95 4.28 4.3  4.32 1.   4.1  3.91 4.08 4.35 3.6  3.67 3.38 3.71 2.67
 3.78 3.33 3.25 2.5  2.6  4.04 3.96 2.33 3.82 3.93 3.4  3.88 3.83 4.02
 3.94 3.89 3.9  3.47 3.7  3.86 3.62 2.75 3.44 4.09 1.5  3.79 3.63 2.43]

review_scores_accuracy = [5.   4.81 4.79  nan 4.65 4.88 4.51 4.49 4.3  4.8  4.96 4.69 4.72 4.75
 4.93 4.67 4.7  4.45 4.78 4.95 4.92 4.82 4.36 4.83 4.87 4.97 4.33 4.55
 4.89 4.85 4.6  4.71 4.52 4.57 4.91 4.5  4.9  4.56 4.77 4.61 4.84 4.94
 4.86 4.98 4.62 4.14 4.68 4.48 4.25 4.44 4.73 4.58 4.99 4.59 4.74 4.76
 4.   4.4  4.39 4.63 4.46 4.17 4.64 4.23 3.25 4.31 4.43 4.12 3.   4.42
 4.54 4.66 3.8  3.75 4.53 0.   4.38 4.47 3.5  4.13 3.67 4.07 4.29 1.
 4.11 4.22 4.37 4.41 4.28 3.57 3.95 4.19 4.34 4.06 4.2  3.88 4.21 3.6
 3.56 3.71 4.09 2.   4.15 4.27 3.92 4.24 3.91 4.32 4.18 3.86 2.6  3.96
 3.33 2.33 3.83 4.26 3.87 3.4  3.63 2.67 4.05 4.1  4.35 4.16 3.2  2.5
 3.93 4.08 2.8  3.85 3.44 3.81 2.29 3.38 3.89]

review_scores_cleanliness = [5.   4.89 4.79 4.87  nan 4.67 4.38 4.03 3.97 3.95 4.44 4.69 4.28 4.5
 4.91 4.58 4.86 4.36 4.13 4.88 4.84 4.98 4.96 4.48 4.54 4.76 4.6  4.26
 4.09 4.64 4.72 4.77 4.78 4.71 4.11 4.62 4.9  4.4  4.22 4.45 4.92 4.61
 4.68 3.7  4.   4.34 4.8  4.83 4.33 4.75 4.85 4.81 4.39 4.93 4.82 3.25
 4.57 3.9  4.63 4.66 4.94 4.53 4.29 3.75 4.43 4.47 4.59 3.63 4.35 4.25
 4.74 4.27 4.95 4.41 3.8  4.99 4.73 4.31 4.7  4.51 4.65 4.97 2.   3.
 4.07 4.23 4.37 4.49 4.55 3.5  3.94 4.42 4.04 4.1  4.17 3.67 4.52 3.88
 4.2  1.   4.56 4.24 3.86 4.15 3.93 4.14 4.46 2.2  4.3  3.77 4.06 1.5
 0.   2.88 3.47 2.5  4.18 4.32 3.58 4.21 3.92 3.96 3.89 3.73 4.16 3.38
 4.08 3.87 4.19 3.43 3.33 3.4  4.12 3.91 3.83 3.2  3.71 3.81 3.06 3.76
 3.57 3.82 3.29 3.6  3.56 3.64 2.75 2.67 3.55 2.6  2.33 3.36 3.34 3.62
 3.84 2.8  3.69 4.05 3.85 3.22 2.57 3.78]

review_scores_checkin = [5.   4.87 4.64  nan 4.95 4.88 4.79 4.63 4.5  4.8  4.94 4.89 4.92 4.98
 4.83 4.99 4.45 4.75 4.78 4.9  4.69 4.84 4.86 4.93 4.97 4.59 4.77 4.76
 4.91 4.67 4.96 4.81 4.6  4.85 4.36 4.44 4.82 4.29 4.33 4.   4.73 4.74
 4.68 4.38 4.49 3.   4.62 4.53 4.71 4.72 4.61 3.5  4.56 4.7  3.25 4.14
 4.25 4.65 4.43 2.   4.48 4.54 4.57 4.47 4.55 4.51 0.   4.39 4.66 1.
 4.13 4.46 4.09 4.05 4.58 4.32 4.52 4.17 3.67 4.2  4.27 3.14 4.31 4.42
 4.4  3.38 4.28 4.35 3.71 1.67 4.41 4.22 2.75 4.15 4.37 4.34 2.6  3.76
 4.12 2.33 4.3  3.73 4.26 4.08 3.4  4.21 2.67 4.11 3.75 4.23 3.33 3.84
 4.19 3.6  3.34 3.8  3.88 3.46 4.16 4.18 4.06 4.24 2.5  3.86 3.89 3.83
 4.1  3.93 3.92 3.56 3.94 3.97 2.86]

review_scores_communication = [5.   4.9  4.76  nan 4.96 4.63 4.84 4.69 4.8  4.86 4.88 4.67 4.92 4.97
 4.95 4.36 4.98 4.91 4.72 4.81 4.89 4.78 4.5  4.94 4.82 4.99 4.93 4.17
 4.83 4.6  4.62 4.87 4.79 4.4  4.71 4.55 4.56 4.77 4.43 4.85 4.75 4.7
 4.74 4.64 4.34 4.46 4.   4.68 4.73 3.5  3.75 4.14 4.59 4.44 2.   4.47
 4.54 4.66 4.31 3.67 4.61 3.   4.29 4.33 4.42 4.52 4.65 4.38 4.58 4.51
 4.45 4.25 4.53 4.37 1.   4.23 4.39 4.1  4.28 4.48 3.83 4.27 4.57 3.33
 4.32 4.3  4.13 4.2  2.5  4.35 3.71 4.08 4.15 3.25 4.11 4.49 4.18 3.8
 3.2  4.26 3.69 3.94 3.9  4.16 4.22 4.41 2.6  2.33 3.88 3.96 3.73 3.4
 4.24 3.56 3.63 4.09 3.92 4.21 4.19 3.6  3.78 4.04 1.5  3.86 2.71]

review_scores_location = [5.   4.92 4.86 4.87  nan 4.58 4.88 4.95 4.85 4.75 4.94 4.98 4.54 4.82
 4.93 4.76 4.62 4.97 4.81 4.14 4.56 4.8  4.53 4.65 4.67 4.4  4.79 4.55
 4.41 4.9  4.64 4.68 4.89 4.17 4.78 4.5  4.73 4.11 4.33 4.74 4.96 4.2
 4.   4.91 4.99 4.83 4.6  4.77 4.59 4.71 4.84 4.66 4.42 4.7  4.25 4.69
 3.5  4.63 4.57 2.5  4.19 4.21 4.39 4.29 4.31 4.52 4.61 3.   4.43 4.44
 4.46 4.72 4.36 3.67 3.75 4.3  4.45 4.47 4.49 4.16 4.08 1.   4.38 3.83
 4.22 4.51 3.64 4.32 4.06 4.48 3.97 4.26 2.   4.28 4.18 3.47 4.07 3.85
 3.7  4.15 4.23 4.34 4.24 4.27 4.37 4.35 2.6  3.33 3.4  3.56 4.1  4.03
 3.9  3.38 3.2  3.57 4.13 3.89 1.5  3.63 3.6  3.82]

review_scores_value = [5.   4.83 4.67 4.87  nan 4.69 4.5  4.23 4.21 4.25 4.75 4.44 4.63 4.92
 4.78 4.64 4.38 4.66 4.82 4.7  4.84 4.8  4.68 4.51 4.65 4.94 4.86 4.79
 4.36 4.72 4.88 4.6  4.4  4.48 3.71 4.56 4.81 3.83 4.39 4.77 4.24 4.55
 4.73 4.54 4.89 4.52 3.67 4.93 4.28 4.85 4.   4.47 4.91 4.33 4.45 4.46
 4.96 4.62 4.59 4.74 4.53 4.26 4.06 4.57 4.35 4.95 4.9  4.71 4.76 4.08
 4.34 4.2  3.5  4.58 4.97 4.61 4.32 4.17 4.43 4.42 2.75 4.29 3.   4.49
 4.3  4.37 4.13 0.   4.07 4.19 4.14 4.41 4.98 3.63 4.12 4.22 1.   3.88
 3.86 4.27 3.29 4.18 4.11 3.93 3.91 4.31 4.09 3.99 3.6  4.04 3.75 3.94
 3.33 3.4  3.38 2.5  4.02 2.   3.25 3.56 3.85 3.92 4.05 3.8  2.6  4.1
 2.33 3.79 3.78 4.99 3.81 3.9  3.89 4.16 3.96 3.7  4.15 2.67 3.82 3.2
 3.77 3.44 1.5  3.22 2.43 3.57]

license = [nan 'STR-2009-FXRRPD' 'STR-2303-FPCPHQ' ... 'STR-2405-GRDKVT'
 'STR-2305-HSTBHY' 'STR-2308-FKJVHP']

instant_bookable = ['f' 't']

calculated_host_listings_count = [  1   2   5   4   9   6   3  18   7 101  33  10  16   8  12  15  21  11
  17  36  22  24  54  14  13  34  20  62  28  30  25  32  47  19  37  23
  51  95  46  92  27]

calculated_host_listings_count_entire_homes = [  1   5   4   7   0   2   3  12   6 101  10  15  11   9   8  31  22  16
  24  54  13  34  20  62  28  30  25  19  17  18  95]

calculated_host_listings_count_private_rooms = [ 0  1  2  4  5  6 21 10  3  9  7 16  8 14 15 11 12 17 39 23 20 13 37 51
 29 92 18]

calculated_host_listings_count_shared_rooms = [0 1 2 6 4 3 5 7 8]

reviews_per_month = [5.000e-02 9.200e-01 2.500e-01 1.700e-01 1.000e-02 6.600e-01 6.000e-02
 4.000e-01 5.300e-01 1.400e-01 3.600e-01 2.400e-01 1.100e-01 2.000e-02
 1.230e+00 8.000e-02 5.200e-01 1.000e-01       nan 3.400e-01 8.200e-01
 5.700e-01 9.000e-02 3.500e-01 4.900e-01 4.000e-02 2.800e-01 2.900e-01
 7.000e-01 4.300e-01 8.000e-01 5.600e-01 4.400e-01 9.800e-01 7.000e-02
 5.470e+00 3.200e-01 8.600e-01 2.300e-01 5.000e-01 6.900e-01 7.900e-01
 1.240e+00 5.400e-01 1.200e-01 7.600e-01 1.270e+00 2.700e-01 2.000e-01
 5.900e-01 2.100e-01 3.000e-02 1.800e-01 1.900e-01 9.300e-01 6.500e-01
 1.490e+00 3.890e+00 1.180e+00 6.800e-01 3.300e-01 4.490e+00 1.500e-01
 8.300e-01 9.700e-01 2.480e+00 4.800e-01 3.910e+00 2.600e-01 1.300e-01
 8.400e-01 1.710e+00 1.600e-01 1.220e+00 3.700e-01 4.100e-01 1.110e+00
 3.100e-01 4.980e+00 3.810e+00 3.900e-01 7.300e-01 1.060e+00 1.100e+00
 9.400e-01 7.500e-01 5.500e-01 1.630e+00 2.330e+00 8.800e-01 3.180e+00
 9.600e-01 3.510e+00 3.330e+00 4.600e-01 1.380e+00 1.460e+00 6.200e-01
 1.770e+00 4.700e-01 1.750e+00 4.670e+00 2.660e+00 2.060e+00 7.200e-01
 2.200e-01 2.520e+00 5.100e-01 7.100e-01 1.000e+00 5.220e+00 1.920e+00
 4.500e-01 3.400e+00 2.310e+00 1.910e+00 4.470e+00 6.700e-01 1.810e+00
 2.670e+00 1.360e+00 6.400e-01 1.020e+00 3.070e+00 1.430e+00 5.800e-01
 1.780e+00 7.800e-01 2.020e+00 1.560e+00 1.310e+00 3.230e+00 3.800e-01
 1.990e+00 8.700e-01 1.070e+00 1.640e+00 4.010e+00 1.670e+00 1.900e+00
 1.050e+00 4.370e+00 1.590e+00 3.150e+00 2.500e+00 2.080e+00 1.160e+00
 1.720e+00 1.030e+00 1.120e+00 4.200e-01 6.300e-01 2.230e+00 2.370e+00
 1.330e+00 3.000e-01 1.260e+00 6.810e+00 8.100e-01 2.270e+00 1.280e+00
 1.130e+00 2.590e+00 9.000e-01 1.250e+00 2.150e+00 5.090e+00 1.090e+00
 6.100e-01 2.700e+00 1.940e+00 1.420e+00 1.080e+00 2.350e+00 1.170e+00
 1.010e+00 2.200e+00 7.420e+00 4.650e+00 1.370e+00 1.290e+00 4.190e+00
 2.720e+00 2.900e+00 2.630e+00 2.960e+00 1.300e+00 4.450e+00 2.770e+00
 4.680e+00 3.410e+00 3.010e+00 1.140e+00 1.600e+00 3.370e+00 5.440e+00
 1.450e+00 3.550e+00 2.950e+00 1.680e+00 1.470e+00 1.510e+00 9.100e-01
 2.690e+00 4.410e+00 1.740e+00 4.080e+00 1.190e+00 1.040e+00 2.560e+00
 6.000e-01 2.840e+00 1.650e+00 1.530e+00 1.620e+00 3.450e+00 3.300e+00
 3.190e+00 1.520e+00 2.540e+00 4.180e+00 1.480e+00 4.570e+00 2.050e+00
 1.150e+00 1.390e+00 1.320e+00 1.550e+00 1.660e+00 2.010e+00 1.820e+00
 8.500e-01 4.020e+00 7.400e-01 1.410e+00 2.580e+00 9.500e-01 2.190e+00
 3.220e+00 3.730e+00 1.970e+00 4.170e+00 3.080e+00 1.570e+00 1.350e+00
 1.850e+00 9.900e-01 1.730e+00 2.470e+00 1.790e+00 7.700e-01 2.320e+00
 8.900e-01 1.200e+00 2.730e+00 2.420e+00 2.600e+00 1.580e+00 3.530e+00
 2.440e+00 2.170e+00 3.120e+00 3.610e+00 2.410e+00 2.290e+00 3.270e+00
 1.930e+00 2.000e+00 2.300e+00 3.680e+00 1.610e+00 2.620e+00 1.950e+00
 2.460e+00 2.970e+00 2.400e+00 4.130e+00 2.140e+00 1.830e+00 3.920e+00
 1.890e+00 2.130e+00 2.250e+00 3.940e+00 4.600e+00 4.380e+00 2.510e+00
 2.070e+00 1.880e+00 6.420e+00 1.870e+00 1.440e+00 4.100e+00 2.390e+00
 5.820e+00 2.650e+00 2.530e+00 2.740e+00 3.170e+00 2.030e+00 3.440e+00
 4.580e+00 2.240e+00 1.700e+00 2.340e+00 1.540e+00 3.110e+00 4.700e+00
 3.050e+00 1.690e+00 3.480e+00 1.840e+00 2.430e+00 2.090e+00 3.280e+00
 3.390e+00 2.040e+00 1.400e+00 5.240e+00 2.210e+00 1.760e+00 3.200e+00
 7.070e+00 6.580e+00 3.560e+00 1.800e+00 3.670e+00 1.500e+00 2.760e+00
 3.760e+00 3.160e+00 2.360e+00 4.740e+00 5.620e+00 2.820e+00 2.100e+00
 2.380e+00 4.220e+00 1.960e+00 2.160e+00 2.910e+00 3.700e+00 2.570e+00
 4.030e+00 2.550e+00 3.850e+00 7.900e+00 3.030e+00 5.800e+00 1.245e+01
 3.060e+00 1.210e+00 3.950e+00 3.350e+00 3.990e+00 2.180e+00 4.270e+00
 1.980e+00 4.330e+00 2.260e+00 9.590e+00 3.250e+00 2.120e+00 3.690e+00
 4.940e+00 3.840e+00 3.520e+00 3.980e+00 1.860e+00 2.780e+00 2.490e+00
 4.060e+00 3.660e+00 2.880e+00 5.180e+00 3.470e+00 4.720e+00 5.060e+00
 1.340e+00 2.930e+00 8.380e+00 6.400e+00 6.910e+00 3.420e+00 3.380e+00
 9.640e+00 5.890e+00 4.250e+00 3.970e+00 5.100e+00 3.130e+00 7.660e+00
 2.640e+00 4.560e+00 5.550e+00 2.990e+00 2.450e+00 3.490e+00 3.630e+00
 4.750e+00 3.720e+00 3.600e+00 5.050e+00 8.470e+00 2.710e+00 4.200e+00
 7.410e+00 8.340e+00 2.790e+00 4.140e+00 5.530e+00 6.240e+00 3.430e+00
 3.460e+00 5.410e+00 4.770e+00 2.830e+00 4.320e+00 2.680e+00 3.290e+00
 2.110e+00 2.890e+00 4.970e+00 3.140e+00 3.740e+00 6.250e+00 4.390e+00
 6.050e+00 4.610e+00 2.280e+00 2.220e+00 4.160e+00 5.120e+00 4.550e+00
 6.130e+00 4.880e+00 2.920e+00 3.040e+00 6.980e+00 5.040e+00 4.820e+00
 4.350e+00 3.800e+00 2.610e+00 7.060e+00 3.790e+00 6.700e+00 2.850e+00
 7.430e+00 6.460e+00 9.670e+00 9.020e+00 8.730e+00 1.176e+01 3.930e+00
 3.780e+00 4.420e+00 5.880e+00 6.340e+00 2.810e+00 6.750e+00 5.280e+00
 3.540e+00 3.880e+00 4.850e+00 3.090e+00 9.100e+00 3.360e+00 7.550e+00
 8.100e+00 5.570e+00 5.190e+00 6.820e+00 8.080e+00 6.920e+00 4.900e+00
 5.140e+00 3.570e+00 6.890e+00 6.320e+00 6.290e+00 4.910e+00 3.340e+00
 6.090e+00 4.050e+00 4.710e+00 8.660e+00 6.360e+00 4.430e+00 5.450e+00
 3.960e+00 4.780e+00 5.510e+00 4.500e+00 3.310e+00 5.230e+00 7.540e+00
 2.940e+00 4.640e+00 6.030e+00 5.600e+00 4.120e+00 2.750e+00 7.960e+00
 9.040e+00 4.460e+00 7.760e+00 5.590e+00 5.250e+00 5.970e+00 2.870e+00
 5.870e+00 3.260e+00 2.860e+00 3.000e+00 6.150e+00 7.000e+00 3.210e+00
 3.580e+00 7.580e+00 3.640e+00 4.090e+00 7.270e+00 3.900e+00 4.070e+00
 4.260e+00 8.780e+00 7.670e+00 4.730e+00 7.870e+00 4.520e+00 3.620e+00
 4.040e+00 6.470e+00 4.540e+00 4.230e+00 5.400e+00 6.960e+00 2.980e+00
 5.650e+00 5.200e+00 6.140e+00 5.940e+00 3.710e+00 4.150e+00 4.290e+00
 5.030e+00 7.400e+00 9.130e+00 6.000e+00 3.240e+00 5.700e+00 7.010e+00
 6.620e+00 6.930e+00 3.320e+00 1.033e+01 7.100e+00 7.260e+00 8.930e+00
 3.820e+00 3.860e+00 4.870e+00 5.310e+00 7.330e+00 3.100e+00 4.400e+00
 6.180e+00 8.140e+00 4.300e+00 5.750e+00 2.800e+00 4.510e+00 4.210e+00
 5.480e+00 6.280e+00 5.170e+00 4.340e+00 4.590e+00 4.110e+00 5.160e+00
 5.000e+00 6.410e+00 6.630e+00 1.071e+01 6.720e+00 3.020e+00 6.270e+00
 5.020e+00 7.240e+00 5.080e+00 4.890e+00 8.710e+00 8.510e+00 9.970e+00
 7.700e+00 3.750e+00 3.590e+00 6.310e+00 4.360e+00 3.870e+00 5.320e+00
 4.000e+00 5.390e+00 9.510e+00 5.460e+00 6.260e+00 5.930e+00 3.500e+00
 8.740e+00 1.077e+01 5.690e+00 9.650e+00 8.180e+00 5.980e+00 4.830e+00
 6.740e+00 6.190e+00 4.480e+00 3.650e+00 4.800e+00 5.500e+00 4.240e+00
 6.660e+00 8.520e+00 5.300e+00 5.810e+00 5.540e+00 6.080e+00 6.690e+00
 4.930e+00 4.690e+00 5.840e+00 4.660e+00 5.150e+00 6.640e+00 8.540e+00
 5.520e+00 6.800e+00 4.630e+00 6.590e+00 5.670e+00 3.830e+00 5.290e+00
 5.210e+00 5.260e+00 1.030e+01 5.430e+00 6.100e+00 8.150e+00 8.000e+00
 6.860e+00 4.530e+00 1.080e+01 6.060e+00 9.320e+00 8.460e+00 7.160e+00
 7.440e+00 9.230e+00 6.010e+00 1.127e+01 9.290e+00 5.270e+00 5.380e+00
 5.920e+00 8.090e+00 6.680e+00 6.170e+00 1.061e+01 5.370e+00 5.010e+00
 7.340e+00 6.730e+00 5.340e+00 6.160e+00 6.610e+00 5.640e+00 5.360e+00
 5.790e+00 5.110e+00 5.730e+00 5.580e+00 5.490e+00 6.040e+00 7.720e+00
 4.760e+00 5.770e+00 4.310e+00 7.170e+00 6.790e+00 6.490e+00 5.720e+00
 5.130e+00 4.790e+00 4.620e+00 6.780e+00 5.330e+00 6.210e+00 6.070e+00
 5.420e+00 1.088e+01 5.900e+00 5.860e+00 5.710e+00 4.920e+00 5.070e+00
 5.560e+00 4.840e+00 9.630e+00 7.280e+00 9.300e+00 7.210e+00 8.210e+00
 8.600e+00 6.370e+00 8.980e+00 7.740e+00 6.380e+00 4.440e+00 6.520e+00
 4.860e+00 1.125e+01 8.020e+00 5.830e+00 7.180e+00 3.770e+00 5.950e+00
 8.950e+00 8.770e+00 1.000e+01 7.850e+00 7.370e+00 5.740e+00 8.670e+00
 8.350e+00 7.220e+00 5.680e+00 6.390e+00 9.500e+00 5.760e+00 9.310e+00
 6.450e+00 9.080e+00 8.320e+00 6.600e+00 6.200e+00 1.135e+01 7.020e+00
 8.640e+00 7.770e+00 1.106e+01 7.500e+00 5.630e+00 7.750e+00 6.990e+00
 1.011e+01 6.840e+00 9.730e+00 7.320e+00 7.560e+00 9.780e+00 6.770e+00
 9.820e+00 8.840e+00 9.360e+00 9.750e+00 8.260e+00 7.920e+00 6.300e+00
 4.950e+00 9.270e+00 8.570e+00 6.650e+00 8.610e+00 8.830e+00 1.233e+01
 1.057e+01 8.280e+00 1.027e+01 6.120e+00 6.530e+00 7.890e+00 1.133e+01
 7.310e+00 6.500e+00 9.250e+00 5.910e+00 9.410e+00 8.480e+00 8.860e+00
 6.550e+00 7.940e+00 9.190e+00 1.009e+01 1.120e+01 6.850e+00 5.850e+00
 7.570e+00 6.510e+00 5.660e+00 7.090e+00 9.000e+00 1.018e+01 8.620e+00
 9.620e+00 8.450e+00 8.440e+00 7.350e+00 1.209e+01 1.054e+01 1.078e+01
 6.430e+00 7.140e+00 9.610e+00 1.219e+01 7.860e+00 1.459e+01 9.890e+00
 1.132e+01 8.400e+00 1.037e+01 1.031e+01 9.110e+00 9.560e+00 1.063e+01
 7.110e+00 8.130e+00 1.193e+01 8.110e+00 1.100e+01 1.111e+01 7.200e+00
 7.590e+00 7.230e+00 7.780e+00 7.950e+00 8.360e+00 6.670e+00 6.540e+00
 6.560e+00 1.119e+01 1.019e+01 8.870e+00 1.167e+01 9.850e+00 1.238e+01
 7.380e+00 1.095e+01 8.330e+00 9.470e+00 7.290e+00 7.830e+00 9.340e+00
 1.091e+01 9.150e+00 1.103e+01 8.490e+00 1.548e+01 9.070e+00 7.360e+00
 1.286e+01 6.110e+00 8.050e+00 9.060e+00 8.680e+00 1.149e+01 7.690e+00
 1.102e+01 9.770e+00 9.380e+00 1.043e+01 1.098e+01 8.250e+00 1.200e+01
 7.300e+00 1.400e+01 8.820e+00]

Cleaning Data¶

  • host_since: This feature is sharing the information of the host that how old the host is in the business. It is a date and for our model we do not need date but the feature might hold important correlation so we will convert it to number of months.
In [10]:
# Converting the dates to datetime object
listings['host_since'] = pd.to_datetime(listings['host_since'])

# Fixing the current date (we can change it but for now we will fix the date - `20 november 2024`)
today_date = datetime.datetime.today()

# Looping to replace the months difference by the date comparing to today
for i in range(len(listings)):
    since_date = listings['host_since'][i]

    # Calculating the months difference
    months_diff = (today_date.year - since_date.year) * 12 + (today_date.month - since_date.month)
    
    listings['host_since'][i] = months_diff
    
listings['host_since'] = listings['host_since'].astype(float)
  • host_responce_time: This feature looks good as we have limited categorical values which we can just convert to number for further process.
In [11]:
listings['host_response_time'].replace({'within a few hours': 2, 'within an hour': 3, 
                                       'within a day': 1, 'a few days or more': 0}, 
                                       inplace = True)

listings['host_response_time'].unique()
Out[11]:
array([nan,  2.,  3.,  1.,  0.])

host_response_rate: Response rate of the host. It matches with the above feature response time but before taking any step we will move forward with transforming of the feature.

In [12]:
for i in range(len(listings)):
    try:
        listings['host_response_rate'][i] = int(listings['host_response_rate'][i].replace('%', ''))
    except AttributeError:
        continue
listings['host_response_rate'] = listings['host_response_rate'].astype(float)

listings['host_response_rate'].unique()
Out[12]:
array([ nan, 100.,  77.,  50.,  88.,  80.,   0.,  97.,  33.,  90.,  86.,
        94.,  96.,  75.,  67.,  91.,  98.,  69.,  60.,  40.,  92.,  95.,
        25.,  70.,  20.,  30.,  76.,  83.,  89.,  78.,  93.,  99.,  79.,
        71.,  85.,  65.,  10.,  73.,   8.,  63.,  82.,  57.,  13.,  14.,
        17.,  45.,   6.,  74.,  47.,  87.,   9.,  26.,  81.,  55.,  62.,
        27.,  58.,  84.,  22.,  46.,  64.,  29.])

host_acceptance_rate: Similar like response rate, this feture tells us the rate of the acceptance.

In [13]:
for i in range(len(listings)):
    try:
        listings['host_acceptance_rate'][i] = int(listings['host_acceptance_rate'][i].replace('%', ''))
    except AttributeError:
        continue
listings['host_acceptance_rate'] = listings['host_acceptance_rate'].astype(float)

listings['host_acceptance_rate'].unique()
Out[13]:
array([ nan,  38., 100.,  60.,  62.,  94.,  89.,  50.,   0.,  96.,  86.,
        83.,  46.,  42.,  75.,  95.,  92.,  80.,  67.,  82.,  40.,  98.,
        97.,  71.,  87.,  73.,  69.,  78.,  93.,  61.,  76.,  91.,  37.,
        90.,  88.,  66.,  84.,  99.,  65.,  74.,  33.,  17.,  77.,  85.,
        79.,  56.,  70.,  59.,  31.,  68.,  14.,  63.,  20.,  25.,  28.,
        48.,  81.,  43.,  29.,  64.,  51.,  53.,  22.,  49.,  44.,  15.,
        30.,  27.,  24.,  39.,  58.,  35.,  21.,  72.,  57.,  55.,  36.,
        11.,  34.,  47.,  18.,  52.,   8.,   5.,  13.,  54.,  41.,  23.,
        12.,  26.,  45.,   9.,  32.,  16.,  10.,   2.,   7.])

host_is_superhost: This feature hold good values but in stirng, it should be boolean (true & false). we can easily transform it.

In [14]:
listings['host_is_superhost'].replace({'f': 0, 't': 1}, inplace = True)

listings['host_is_superhost'].unique()
Out[14]:
array([ 0.,  1., nan])

host_verifications: This feature also having limited categorical values so we will move forward with transforming it

In [15]:
listings['host_verifications'].replace({
    "['email', 'phone', 'work_email']": 7, 
    "['email', 'phone']": 6, 
    "['phone', 'work_email']": 5, 
    "['email', 'work_email']": 4, 
    "['phone']": 3, 
    "['work_email']": 2, 
    "['email']": 1, 
    '[]': 0
}, inplace = True)

(listings['host_verifications'].unique())
Out[15]:
array([ 6.,  7.,  3.,  5.,  1.,  2.,  0.,  4., nan])

host_has_profile_pic:

In [16]:
listings['host_has_profile_pic'].replace({'f': 0, 't': 1}, inplace = True)

listings['host_has_profile_pic'].unique()
Out[16]:
array([ 1.,  0., nan])

host_identity_verified:

In [17]:
listings['host_identity_verified'].replace({'f': 0, 't': 1}, inplace = True)

listings['host_identity_verified'].unique()
Out[17]:
array([ 1.,  0., nan])

room_type:

In [18]:
listings['room_type'].replace({'Entire home/apt': 2, 'Private room': 1, 'Shared room': 0}, inplace = True)

listings['room_type'].unique()
Out[18]:
array([2, 1, 0], dtype=int64)

bathrooms & bathrooms_text: These two features are representing same insight so we can combine both or remove one. But, we cannot remove them as bathroom feature has a lot of nan values whereas bathrooms_text feature does contain the values. As we do not need the text feature so we can use it to fill bathrooms feature and remove bathrooms_text after that.

In [19]:
# Filling bathrooms feature from bathroom_text where empty/null

for i in range(len(listings)):
    if pd.isna(listings['bathrooms'][i]):
        if pd.isna(listings['bathrooms_text'][i]):
            listings['bathrooms'][i] = None
        else:
            try:
                listings['bathrooms'][i] = float(listings['bathrooms_text'][i].split()[0])
            except ValueError as ve:
                listings['bathrooms'][i] = None
listings.drop(columns = ['bathrooms_text'], inplace = True)

listings['bathrooms'].unique()
Out[19]:
array([3. , 1.5, 1. , 0.5, 2. , 0. , 2.5, 4. , 5. , 3.5, 4.5, nan, 5.5,
       6.5, 6. , 8. ])

amenities: This feature hold the list of amenities available in particular bnb. as for now we will transofrm this feature with the count of the amenities in each listing and move forward.

In [20]:
# Transforming amenities feature with number of amenities available for sumaarized modeling

amenities_list = list(df['amenities'].apply(ast.literal_eval))
for i in range(len(listings)):
    listings['amenities'][i] = len(amenities_list[i])
listings['amenities'] = listings['amenities'].astype(float)

listings['amenities'].unique()
Out[20]:
array([ 13.,  10.,  43.,  47.,  34.,  57.,  27.,  37.,  25.,  32.,  26.,
         9.,  17.,  11.,  64.,  31.,  53.,  12.,  50.,  28.,  38.,  62.,
        69.,  35.,   7.,  63.,  61.,  59.,  44.,  52.,  73.,  22.,  16.,
        65.,  36.,  41.,  58.,  18.,  29.,  30.,  48.,  24.,  20.,  42.,
        15.,  19.,   8.,  21.,  49.,   0.,  86.,  51.,   5.,   6.,  33.,
        55.,  45.,  56.,  54.,  40.,  39.,  46.,  60.,  68.,  23.,  14.,
        74.,  66.,   2.,   4.,  83.,  71.,  72.,   3.,  79.,  77.,  76.,
        70.,  99.,  75.,   1.,  67.,  87.,  85.,  81.,  78.,  80.,  82.,
        91.,  90., 104.,  93., 103.,  84.,  94.])

price: This feature is our target feature as well. It contains a lot of different characters which are not needed so we will use the same technique to trasform it but with regular expression as added step for easy transformation to cover all the different special characters.

In [21]:
listings['price'] = listings['price'].replace({'\$': '', ',': '', ' ': ''}, regex = True).apply(pd.to_numeric, errors = 'coerce')

listings['price'].head(10)
Out[21]:
0      NaN
1      NaN
2    172.0
3     75.0
4      NaN
5      NaN
6     79.0
7    126.0
8    148.0
9     90.0
Name: price, dtype: float64

has_availability:

In [22]:
listings['has_availability'].replace({'f': 0, 't': 1}, inplace = True)

listings['has_availability'].unique()
Out[22]:
array([ 1., nan,  0.])

first_review & last_review: These two features are dates and we're not sure as for now if needed and if yes then what transformation we should follow as we are not working with time series so having a date feature won't make much sense. If we think these are needed we'll add them again and run the analysis again.

In [23]:
listings.drop(columns = ['first_review', 'last_review'], inplace = True)

license: Although, the license denotes a specific type of the home. But as for now we will remove it and will see if needed we will include it in our analysis further again.

In [24]:
listings.drop(columns = ['license'], inplace = True)

instant_bookable:

In [25]:
listings['instant_bookable'].replace({'f': 0, 't': 1}, inplace = True)

listings['instant_bookable'].unique()
Out[25]:
array([0, 1], dtype=int64)

Now, we are done with the basic transformation which we can run amd will move forward with further analysis (Categorical Encoding)

Encoding Data¶

In [26]:
# Features still with type object as they needs categorical encoding to move forward

# Categorical Features
cat_cols = listings.select_dtypes(include = ['object'])

# Creating an object of Label Encoder
label_encoder = LabelEncoder()

# Looping in order to encode all the categorical features
for col in cat_cols:
    listings[col] = label_encoder.fit_transform(listings[col].astype(str))
In [27]:
listings.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 21825 entries, 0 to 21824
Data columns (total 51 columns):
 #   Column                                        Non-Null Count  Dtype  
---  ------                                        --------------  -----  
 0   host_since                                    21823 non-null  float64
 1   host_response_time                            15741 non-null  float64
 2   host_response_rate                            15741 non-null  float64
 3   host_acceptance_rate                          16297 non-null  float64
 4   host_is_superhost                             20914 non-null  float64
 5   host_neighbourhood                            21825 non-null  int32  
 6   host_listings_count                           21823 non-null  float64
 7   host_total_listings_count                     21823 non-null  float64
 8   host_verifications                            21823 non-null  float64
 9   host_has_profile_pic                          21823 non-null  float64
 10  host_identity_verified                        21823 non-null  float64
 11  neighbourhood_cleansed                        21825 non-null  int32  
 12  latitude                                      21825 non-null  float64
 13  longitude                                     21825 non-null  float64
 14  property_type                                 21825 non-null  int32  
 15  room_type                                     21825 non-null  int64  
 16  accommodates                                  21825 non-null  int64  
 17  bathrooms                                     21800 non-null  float64
 18  bedrooms                                      20185 non-null  float64
 19  beds                                          16519 non-null  float64
 20  amenities                                     21825 non-null  float64
 21  price                                         16536 non-null  float64
 22  minimum_nights                                21825 non-null  int64  
 23  maximum_nights                                21825 non-null  int64  
 24  minimum_minimum_nights                        21825 non-null  int64  
 25  maximum_minimum_nights                        21825 non-null  int64  
 26  minimum_maximum_nights                        21825 non-null  int64  
 27  maximum_maximum_nights                        21825 non-null  int64  
 28  minimum_nights_avg_ntm                        21825 non-null  float64
 29  maximum_nights_avg_ntm                        21825 non-null  float64
 30  has_availability                              20774 non-null  float64
 31  availability_30                               21825 non-null  int64  
 32  availability_60                               21825 non-null  int64  
 33  availability_90                               21825 non-null  int64  
 34  availability_365                              21825 non-null  int64  
 35  number_of_reviews                             21825 non-null  int64  
 36  number_of_reviews_ltm                         21825 non-null  int64  
 37  number_of_reviews_l30d                        21825 non-null  int64  
 38  review_scores_rating                          16610 non-null  float64
 39  review_scores_accuracy                        16608 non-null  float64
 40  review_scores_cleanliness                     16608 non-null  float64
 41  review_scores_checkin                         16608 non-null  float64
 42  review_scores_communication                   16608 non-null  float64
 43  review_scores_location                        16607 non-null  float64
 44  review_scores_value                           16608 non-null  float64
 45  instant_bookable                              21825 non-null  int64  
 46  calculated_host_listings_count                21825 non-null  int64  
 47  calculated_host_listings_count_entire_homes   21825 non-null  int64  
 48  calculated_host_listings_count_private_rooms  21825 non-null  int64  
 49  calculated_host_listings_count_shared_rooms   21825 non-null  int64  
 50  reviews_per_month                             16610 non-null  float64
dtypes: float64(28), int32(3), int64(20)
memory usage: 8.2 MB

As we can see, our data has no object type data now. Let's move forward with next steps now. We will clean the data. (dealing with missing values)

In [28]:
transformed_data = listings.copy()

Distribution Overview¶

Dealing with Missing Values¶

In [29]:
# Getting all the features which hold missing values
missing_features = transformed_data.columns[transformed_data.isnull().any()]

missing_features
Out[29]:
Index(['host_since', 'host_response_time', 'host_response_rate',
       'host_acceptance_rate', 'host_is_superhost', 'host_listings_count',
       'host_total_listings_count', 'host_verifications',
       'host_has_profile_pic', 'host_identity_verified', 'bathrooms',
       'bedrooms', 'beds', 'price', 'has_availability', 'review_scores_rating',
       'review_scores_accuracy', 'review_scores_cleanliness',
       'review_scores_checkin', 'review_scores_communication',
       'review_scores_location', 'review_scores_value', 'reviews_per_month'],
      dtype='object')

let's check the distribution of all the features having missing values in order to decide what values we should fill.

In [30]:
# Number of plots
n_plots = len(missing_features)

# Calculating the number of rows needed for the subplots
n_rows = (n_plots // 3) + (n_plots % 3 > 0)

# Setting the subplots
fig, axes = plt.subplots(n_rows, 3, figsize = (20, n_rows * 5))

# Flattening the axes array for easier iteration
axes = axes.flatten()

# Looping through the missing features and creating a distribution plot for all the features
for i, feature in enumerate(missing_features):
    # Location of the subplot
    ax = axes[i]
    
    # Visualization - Histogram
    sns.histplot(transformed_data[feature].dropna(), kde = True, ax = ax, color = 'skyblue', bins = 30)
    
    # Getting the value of mean & median
    mean_val = transformed_data[feature].mean()
    median_val = transformed_data[feature].median()

    # Adding mean & median vertical lines
    ax.axvline(mean_val, color = 'red', linestyle = '--', label = f'Mean: {mean_val:.2f}')
    ax.axvline(median_val, color = 'green', linestyle = '--', label = f'Median: {median_val:.2f}')
    
    # Labeling
    ax.set_title(f'Distribution of {feature}')
    ax.set_xlabel(feature)
    ax.set_ylabel('Frequency')
    
    # Legend
    ax.legend()

# Removing any unused subplot at the end
for i in range(n_plots, len(axes)):
    fig.delaxes(axes[i])

# Adjusting Layout for better spacing
plt.tight_layout()

# Showing
plt.show()
No description has been provided for this image

The above visuaklization is to just get an overview of the features' distribution. below we will make a function which can be called to get the visualization anywhere in the code ahead:

Visualization Function¶

In [31]:
def visualize_feature(feature, data):
    # Setting the figure size
    plt.figure(figsize = (8, 4))

    # Visualization - Histogram
    sns.histplot(data[feature].dropna(), kde = True, color = 'skyblue', bins = 30)

    # Getting the value of mean & median
    mean_val = data[feature].mean()
    median_val = data[feature].median()

    # Adding mean & median vertical lines
    plt.axvline(mean_val, color = 'red', linestyle = '--', label = f'Mean: {mean_val:.2f}')
    plt.axvline(median_val, color = 'green', linestyle = '--', label = f'Median: {median_val:.2f}')

    # Labeling
    plt.title(f'Distribution of {feature}')
    plt.xlabel(feature)
    plt.ylabel('Frequency')

    # Legend
    plt.legend()

    # Adjusting Layout for better spacing
    plt.tight_layout()

    # Showing
    plt.show()

Before moving forward, we will fill in the missing values of our Target variable price.

Imputing Target Variable¶

Price: We found that, in our other dataset calendar.csv, it contains price by the date of all the listings. So, we will fill in the nan values of our target variable price with those values.

In [32]:
# Reading calendar data
calendar = pd.read_csv('./Data-AirBNB/calendar.csv')

# Converting the feature 'date' to datetime for filtering ahead
calendar['date'] = pd.to_datetime(calendar['date'])

# Filtering the data to get the latest records only
calendar = calendar.sort_values(by = ['listing_id', 'date'], ascending = [True, False])

# Now, we will remove the duplicate rows as it will keep the first row and will delter futher repitions
calendar = calendar.drop_duplicates(subset = 'listing_id', keep = 'first')

# Cleaning the price values
calendar['price'] = calendar['price'].replace({'\$': '', ',': '', ' ': ''}, regex = True).apply(pd.to_numeric, errors = 'coerce')

calendar.head(5)
Out[32]:
listing_id date available price adjusted_price minimum_nights maximum_nights
364 1419 2025-09-05 f 469.0 NaN 28.0 730.0
729 8077 2025-09-05 f 75.0 NaN 180.0 365.0
1094 26654 2025-09-05 t 155.0 NaN 28.0 1125.0
2338 27423 2025-09-05 f 75.0 NaN 90.0 365.0
3582 30931 2025-09-05 f 100.0 NaN 180.0 365.0
In [33]:
# Getting the IDs back from the main source
transformed_data['listing_id'] = df['id']

# Looping to get all the prices
for i in range(len(transformed_data)):
    if pd.isna(transformed_data['price'][i]):
        transformed_data['price'][i] = calendar[calendar['listing_id'] == transformed_data['listing_id'][i]].reset_index(drop = True)['price'][0]

# Dropping the id feature again as not needed anymore
transformed_data = transformed_data.drop(columns = ['listing_id']).reset_index(drop = True)

transformed_data['price']
Out[33]:
0        469.0
1         75.0
2        172.0
3         75.0
4        100.0
         ...  
21820    350.0
21821     89.0
21822    170.0
21823    150.0
21824    245.0
Name: price, Length: 21825, dtype: float64

Let's create a function which is a model (random forest) which we will run everytime we are fixing any feature (imputing, deleting, etc) and will keep cross-checking everytie if the model is improving or not.

Model Check Function¶

In [34]:
def model_check(data):
    
    model_data = data.dropna()
    # Separating features (X) and target (y)
    X = model_data.drop('price', axis = 1)
    y = model_data['price']
    
    # Splitting the data (test & train)
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 42)

    # Creating random forest object
    rf_model = RandomForestRegressor(n_estimators = 200, random_state = 42, n_jobs = -1, oob_score = False, max_features = 11, bootstrap = False)
    
    # Fitting/Training the model
    rf_model.fit(X_train, y_train)
    
    # Predicting
    y_pred_rf = rf_model.predict(X_test)
    
    # Metrics
    mse_rf = mean_squared_error(y_test, y_pred_rf)
    mae_rf = mean_absolute_error(y_test, y_pred_rf)
    rmse_rf = np.sqrt(mse_rf)
    r2_rf = r2_score(y_test, y_pred_rf)
    
    return f"MSE: {mse_rf}, MAE: {mae_rf}, RMSE: {rmse_rf}, R2: {r2_rf}"

Let's also create an imputation function which can be called easily ahead after every feature to check the model:

Imputation Function¶

In [35]:
def impute_feature(feature, data):
    # Copying the data into 3 DFs
    df_mean_imputed = data.copy()
    df_median_imputed = data.copy()
    df_multiple_imputed = data.copy()

    # Mean Imputation
    mean_value = df_mean_imputed[feature].mean()
    df_mean_imputed[feature] = df_mean_imputed[feature].fillna(mean_value)

    # Median Imputation
    median_value = df_median_imputed[feature].median()
    df_median_imputed[feature] = df_median_imputed[feature].fillna(median_value)

    # Multiple imputation using IterativeImputer
    imputer = IterativeImputer()
    df_multiple_imputed[feature] = imputer.fit_transform(
        df_multiple_imputed[[feature]]
    )

    # Removing the feature
    feature_drop = data.drop(columns = feature)

    # Dropping the rows for the feature
    rows_drop = data.dropna(subset = [feature]).reset_index(drop = True)

    # Returning the DFs
    return df_mean_imputed, df_median_imputed, df_multiple_imputed, feature_drop, rows_drop

Now that price is imputed correctly and model check function has been created, let's move forward and analyze other features for imputation. First we will check the null information:-

In [36]:
# Checking null values

transformed_data.isnull().sum()
Out[36]:
host_since                                         2
host_response_time                              6084
host_response_rate                              6084
host_acceptance_rate                            5528
host_is_superhost                                911
host_neighbourhood                                 0
host_listings_count                                2
host_total_listings_count                          2
host_verifications                                 2
host_has_profile_pic                               2
host_identity_verified                             2
neighbourhood_cleansed                             0
latitude                                           0
longitude                                          0
property_type                                      0
room_type                                          0
accommodates                                       0
bathrooms                                         25
bedrooms                                        1640
beds                                            5306
amenities                                          0
price                                              0
minimum_nights                                     0
maximum_nights                                     0
minimum_minimum_nights                             0
maximum_minimum_nights                             0
minimum_maximum_nights                             0
maximum_maximum_nights                             0
minimum_nights_avg_ntm                             0
maximum_nights_avg_ntm                             0
has_availability                                1051
availability_30                                    0
availability_60                                    0
availability_90                                    0
availability_365                                   0
number_of_reviews                                  0
number_of_reviews_ltm                              0
number_of_reviews_l30d                             0
review_scores_rating                            5215
review_scores_accuracy                          5217
review_scores_cleanliness                       5217
review_scores_checkin                           5217
review_scores_communication                     5217
review_scores_location                          5218
review_scores_value                             5217
instant_bookable                                   0
calculated_host_listings_count                     0
calculated_host_listings_count_entire_homes        0
calculated_host_listings_count_private_rooms       0
calculated_host_listings_count_shared_rooms        0
reviews_per_month                               5215
dtype: int64
In [37]:
# Getting the percentage of null values

transformed_data.apply(lambda x: f"{round((x.isnull().sum() / len(transformed_data)) * 100, 2)} %")
Out[37]:
host_since                                       0.01 %
host_response_time                              27.88 %
host_response_rate                              27.88 %
host_acceptance_rate                            25.33 %
host_is_superhost                                4.17 %
host_neighbourhood                                0.0 %
host_listings_count                              0.01 %
host_total_listings_count                        0.01 %
host_verifications                               0.01 %
host_has_profile_pic                             0.01 %
host_identity_verified                           0.01 %
neighbourhood_cleansed                            0.0 %
latitude                                          0.0 %
longitude                                         0.0 %
property_type                                     0.0 %
room_type                                         0.0 %
accommodates                                      0.0 %
bathrooms                                        0.11 %
bedrooms                                         7.51 %
beds                                            24.31 %
amenities                                         0.0 %
price                                             0.0 %
minimum_nights                                    0.0 %
maximum_nights                                    0.0 %
minimum_minimum_nights                            0.0 %
maximum_minimum_nights                            0.0 %
minimum_maximum_nights                            0.0 %
maximum_maximum_nights                            0.0 %
minimum_nights_avg_ntm                            0.0 %
maximum_nights_avg_ntm                            0.0 %
has_availability                                 4.82 %
availability_30                                   0.0 %
availability_60                                   0.0 %
availability_90                                   0.0 %
availability_365                                  0.0 %
number_of_reviews                                 0.0 %
number_of_reviews_ltm                             0.0 %
number_of_reviews_l30d                            0.0 %
review_scores_rating                            23.89 %
review_scores_accuracy                           23.9 %
review_scores_cleanliness                        23.9 %
review_scores_checkin                            23.9 %
review_scores_communication                      23.9 %
review_scores_location                          23.91 %
review_scores_value                              23.9 %
instant_bookable                                  0.0 %
calculated_host_listings_count                    0.0 %
calculated_host_listings_count_entire_homes       0.0 %
calculated_host_listings_count_private_rooms      0.0 %
calculated_host_listings_count_shared_rooms       0.0 %
reviews_per_month                               23.89 %
dtype: object

Transforming Data¶

Imputing & Dropping¶

Firstly, we will start with the features which carries less than 5% missing values and will check and decide on the go (impute or remove)

  • host_since
  • host_is_superhost
  • host_listings_count
  • host_total_listings_count
  • host_verifications
  • host_has_profile_pic
  • host_identity_verified
  • bathrooms
  • bedrooms
  • has_availability
In [38]:
# Making another variable for cleaning data

clean_data = transformed_data.copy()

host_since

In [39]:
visualize_feature('host_since', clean_data)
No description has been provided for this image
In [40]:
mean_imp, med_imp, mult_imp, feature_drop, rows_drop = impute_feature('host_since', clean_data)

print(f"""Mean:- {model_check(mean_imp)}
Median:- {model_check(med_imp)}
Multiple:- {model_check(mult_imp)}
Feature Drop:- {model_check(feature_drop)}
Rows Drop:- {model_check(rows_drop)}""")
Mean:- MSE: 10165.095488708179, MAE: 53.58457713754647, RMSE: 100.82209821615587, R2: 0.6413249870208739
Median:- MSE: 10165.095488708179, MAE: 53.58457713754647, RMSE: 100.82209821615587, R2: 0.6413249870208739
Multiple:- MSE: 10165.095488708179, MAE: 53.58457713754647, RMSE: 100.82209821615587, R2: 0.6413249870208739
Feature Drop:- MSE: 10504.57746358039, MAE: 53.746289498141266, RMSE: 102.4918409610267, R2: 0.6293463782730535
Rows Drop:- MSE: 10165.095488708179, MAE: 53.58457713754647, RMSE: 100.82209821615587, R2: 0.6413249870208739

We imputed the missing values in the feature host_since with 3 methods:- mean, median and multiple and then sent the data to the model just to check the effect of the imputation on the model and we got same R^2 = 64. So, we can impute the value with any method we want. And further, we are going to perform ame steps for all the features and check for the change in model.

In [41]:
clean_data = mult_imp.copy()

host_is_superhost

In [42]:
visualize_feature('host_is_superhost', clean_data)
No description has been provided for this image
In [43]:
mean_imp, med_imp, mult_imp, feature_drop, rows_drop = impute_feature('host_is_superhost', clean_data)

print(f"""Mean:- {model_check(mean_imp)}
Median:- {model_check(med_imp)}
Multiple:- {model_check(mult_imp)}
Feature Drop:- {model_check(feature_drop)}
Rows Drop:- {model_check(rows_drop)}""")
Mean:- MSE: 12539.781172376543, MAE: 51.86006613756614, RMSE: 111.98116436426504, R2: 0.6073229930286392
Median:- MSE: 12892.292865112433, MAE: 51.24221119929454, RMSE: 113.5442330773009, R2: 0.5962842647986072
Multiple:- MSE: 12539.781172376543, MAE: 51.86006613756614, RMSE: 111.98116436426504, R2: 0.6073229930286392
Feature Drop:- MSE: 13048.224654673722, MAE: 51.895185185185184, RMSE: 114.22882584826705, R2: 0.5914013384081931
Rows Drop:- MSE: 10165.095488708179, MAE: 53.58457713754647, RMSE: 100.82209821615587, R2: 0.6413249870208739
In [44]:
clean_data = rows_drop.copy()

As we can see the model's accuracy decreased. Even if we will remove the feature, it will just cause more reduction in the r2 value. So, for now looking at the model_check function's output, we dropped rows for this feature.

Let's move ahead and follow the same rules for all the features¶

host_listings_count

In [45]:
visualize_feature('host_listings_count', clean_data)
No description has been provided for this image
In [46]:
mean_imp, med_imp, mult_imp, feature_drop, rows_drop = impute_feature('host_listings_count', clean_data)

print(f"""Mean:- {model_check(mean_imp)}
Median:- {model_check(med_imp)}
Multiple:- {model_check(mult_imp)}
Feature Drop:- {model_check(feature_drop)}
Rows Drop:- {model_check(rows_drop)}""")
Mean:- MSE: 10165.095488708179, MAE: 53.58457713754647, RMSE: 100.82209821615587, R2: 0.6413249870208739
Median:- MSE: 10165.095488708179, MAE: 53.58457713754647, RMSE: 100.82209821615587, R2: 0.6413249870208739
Multiple:- MSE: 10165.095488708179, MAE: 53.58457713754647, RMSE: 100.82209821615587, R2: 0.6413249870208739
Feature Drop:- MSE: 10377.001284293681, MAE: 53.28509293680297, RMSE: 101.86756738183985, R2: 0.6338478989732101
Rows Drop:- MSE: 10165.095488708179, MAE: 53.58457713754647, RMSE: 100.82209821615587, R2: 0.6413249870208739
In [47]:
clean_data = mult_imp.copy()

host_total_listings_count

In [48]:
visualize_feature('host_total_listings_count', clean_data)
No description has been provided for this image
In [49]:
mean_imp, med_imp, mult_imp, feature_drop, rows_drop = impute_feature('host_total_listings_count', clean_data)

print(f"""Mean:- {model_check(mean_imp)}
Median:- {model_check(med_imp)}
Multiple:- {model_check(mult_imp)}
Feature Drop:- {model_check(feature_drop)}
Rows Drop:- {model_check(rows_drop)}""")
Mean:- MSE: 10165.095488708179, MAE: 53.58457713754647, RMSE: 100.82209821615587, R2: 0.6413249870208739
Median:- MSE: 10165.095488708179, MAE: 53.58457713754647, RMSE: 100.82209821615587, R2: 0.6413249870208739
Multiple:- MSE: 10165.095488708179, MAE: 53.58457713754647, RMSE: 100.82209821615587, R2: 0.6413249870208739
Feature Drop:- MSE: 10668.001517646375, MAE: 53.62444005576207, RMSE: 103.28601801621735, R2: 0.6235799666561306
Rows Drop:- MSE: 10165.095488708179, MAE: 53.58457713754647, RMSE: 100.82209821615587, R2: 0.6413249870208739
In [50]:
clean_data = mult_imp.copy()

host_verifications

In [51]:
visualize_feature('host_verifications', clean_data)
No description has been provided for this image
In [52]:
mean_imp, med_imp, mult_imp, feature_drop, rows_drop = impute_feature('host_verifications', clean_data)

print(f"""Mean:- {model_check(mean_imp)}
Median:- {model_check(med_imp)}
Multiple:- {model_check(mult_imp)}
Feature Drop:- {model_check(feature_drop)}
Rows Drop:- {model_check(rows_drop)}""")
Mean:- MSE: 10165.095488708179, MAE: 53.58457713754647, RMSE: 100.82209821615587, R2: 0.6413249870208739
Median:- MSE: 10165.095488708179, MAE: 53.58457713754647, RMSE: 100.82209821615587, R2: 0.6413249870208739
Multiple:- MSE: 10165.095488708179, MAE: 53.58457713754647, RMSE: 100.82209821615587, R2: 0.6413249870208739
Feature Drop:- MSE: 10352.554493285315, MAE: 53.39951208178439, RMSE: 101.74750362188409, R2: 0.6347105030768276
Rows Drop:- MSE: 10165.095488708179, MAE: 53.58457713754647, RMSE: 100.82209821615587, R2: 0.6413249870208739
In [53]:
clean_data = mult_imp.copy()

host_has_profile_pic

In [54]:
visualize_feature('host_has_profile_pic', clean_data)
No description has been provided for this image
In [55]:
mean_imp, med_imp, mult_imp, feature_drop, rows_drop = impute_feature('host_has_profile_pic', clean_data)

print(f"""Mean:- {model_check(mean_imp)}
Median:- {model_check(med_imp)}
Multiple:- {model_check(mult_imp)}
Feature Drop:- {model_check(feature_drop)}
Rows Drop:- {model_check(rows_drop)}""")
Mean:- MSE: 10165.095488708179, MAE: 53.58457713754647, RMSE: 100.82209821615587, R2: 0.6413249870208739
Median:- MSE: 10165.095488708179, MAE: 53.58457713754647, RMSE: 100.82209821615587, R2: 0.6413249870208739
Multiple:- MSE: 10165.095488708179, MAE: 53.58457713754647, RMSE: 100.82209821615587, R2: 0.6413249870208739
Feature Drop:- MSE: 10039.053234316916, MAE: 52.96376394052045, RMSE: 100.1950758985536, R2: 0.6457723832386394
Rows Drop:- MSE: 10165.095488708179, MAE: 53.58457713754647, RMSE: 100.82209821615587, R2: 0.6413249870208739
In [56]:
clean_data = feature_drop.copy()

This is the first feature we observed which boosted the r2 value a bit when we drop the feature. So, we will dro the feature and move ahead as for now.

host_identity_verified

In [57]:
visualize_feature('host_identity_verified', clean_data)
No description has been provided for this image
In [58]:
mean_imp, med_imp, mult_imp, feature_drop, rows_drop = impute_feature('host_identity_verified', clean_data)

print(f"""Mean:- {model_check(mean_imp)}
Median:- {model_check(med_imp)}
Multiple:- {model_check(mult_imp)}
Feature Drop:- {model_check(feature_drop)}
Rows Drop:- {model_check(rows_drop)}""")
Mean:- MSE: 10039.053234316916, MAE: 52.96376394052045, RMSE: 100.1950758985536, R2: 0.6457723832386394
Median:- MSE: 10039.053234316916, MAE: 52.96376394052045, RMSE: 100.1950758985536, R2: 0.6457723832386394
Multiple:- MSE: 10039.053234316916, MAE: 52.96376394052045, RMSE: 100.1950758985536, R2: 0.6457723832386394
Feature Drop:- MSE: 10524.695340032527, MAE: 53.50630576208179, RMSE: 102.58993781084249, R2: 0.628636519757155
Rows Drop:- MSE: 10039.053234316916, MAE: 52.96376394052045, RMSE: 100.1950758985536, R2: 0.6457723832386394
In [59]:
clean_data = mult_imp.copy()

bathrooms

In [60]:
visualize_feature('bathrooms', clean_data)
No description has been provided for this image
In [61]:
mean_imp, med_imp, mult_imp, feature_drop, rows_drop = impute_feature('bathrooms', clean_data)

print(f"""Mean:- {model_check(mean_imp)}
Median:- {model_check(med_imp)}
Multiple:- {model_check(mult_imp)}
Feature Drop:- {model_check(feature_drop)}
Rows Drop:- {model_check(rows_drop)}""")
Mean:- MSE: 43091.50518031817, MAE: 54.76129354389224, RMSE: 207.5849348587661, R2: 0.30160163899510406
Median:- MSE: 43316.96363250116, MAE: 55.43455875522527, RMSE: 208.1272774830372, R2: 0.2979475588505279
Multiple:- MSE: 43091.50518031817, MAE: 54.76129354389224, RMSE: 207.5849348587661, R2: 0.30160163899510406
Feature Drop:- MSE: 43278.16783970041, MAE: 56.51264050162564, RMSE: 208.03405451920705, R2: 0.29857633517181326
Rows Drop:- MSE: 10039.053234316916, MAE: 52.96376394052045, RMSE: 100.1950758985536, R2: 0.6457723832386394
In [62]:
clean_data = rows_drop.copy()

bedrooms

In [63]:
visualize_feature('bedrooms', clean_data)
No description has been provided for this image
In [64]:
mean_imp, med_imp, mult_imp, feature_drop, rows_drop = impute_feature('bedrooms', clean_data)

print(f"""Mean:- {model_check(mean_imp)}
Median:- {model_check(med_imp)}
Multiple:- {model_check(mult_imp)}
Feature Drop:- {model_check(feature_drop)}
Rows Drop:- {model_check(rows_drop)}""")
Mean:- MSE: 8922.610786321413, MAE: 51.716618671621, RMSE: 94.45957223236518, R2: 0.6475003904556498
Median:- MSE: 9123.686382698561, MAE: 51.599275429633074, RMSE: 95.51798983803292, R2: 0.6395566315145446
Multiple:- MSE: 8922.610786321413, MAE: 51.716618671621, RMSE: 94.45957223236518, R2: 0.6475003904556498
Feature Drop:- MSE: 10330.488675998606, MAE: 53.64445192754297, RMSE: 101.63901158511237, R2: 0.5918803014164484
Rows Drop:- MSE: 10039.053234316916, MAE: 52.96376394052045, RMSE: 100.1950758985536, R2: 0.6457723832386394
In [65]:
clean_data = mult_imp.copy()

has_availability

In [66]:
visualize_feature('has_availability', clean_data)
No description has been provided for this image
In [67]:
mean_imp, med_imp, mult_imp, feature_drop, rows_drop = impute_feature('has_availability', clean_data)

print(f"""Mean:- {model_check(mean_imp)}
Median:- {model_check(med_imp)}
Multiple:- {model_check(mult_imp)}
Feature Drop:- {model_check(feature_drop)}
Rows Drop:- {model_check(rows_drop)}""")
Mean:- MSE: 18090.303518500466, MAE: 55.08571494893222, RMSE: 134.5001989533862, R2: 0.5413275797874368
Median:- MSE: 17699.388780280875, MAE: 54.86184540389973, RMSE: 133.03904983229876, R2: 0.5512390668386394
Multiple:- MSE: 18090.303518500466, MAE: 55.08571494893222, RMSE: 134.5001989533862, R2: 0.5413275797874368
Feature Drop:- MSE: 17591.500776949862, MAE: 54.855868152274844, RMSE: 132.63295509393532, R2: 0.5539745240712464
Rows Drop:- MSE: 8922.610786321413, MAE: 51.716618671621, RMSE: 94.45957223236518, R2: 0.6475003904556498
In [68]:
clean_data = rows_drop.copy()

Finally, we are done with the imputation of the features' values which carries missing values < 5%

In [69]:
# Getting the percentage of null values

clean_data.apply(lambda x: f"{round((x.isnull().sum() / len(clean_data)) * 100, 2)} %")
Out[69]:
host_since                                        0.0 %
host_response_time                              25.29 %
host_response_rate                              25.29 %
host_acceptance_rate                            22.66 %
host_is_superhost                                 0.0 %
host_neighbourhood                                0.0 %
host_listings_count                               0.0 %
host_total_listings_count                         0.0 %
host_verifications                                0.0 %
host_identity_verified                            0.0 %
neighbourhood_cleansed                            0.0 %
latitude                                          0.0 %
longitude                                         0.0 %
property_type                                     0.0 %
room_type                                         0.0 %
accommodates                                      0.0 %
bathrooms                                         0.0 %
bedrooms                                          0.0 %
beds                                            21.44 %
amenities                                         0.0 %
price                                             0.0 %
minimum_nights                                    0.0 %
maximum_nights                                    0.0 %
minimum_minimum_nights                            0.0 %
maximum_minimum_nights                            0.0 %
minimum_maximum_nights                            0.0 %
maximum_maximum_nights                            0.0 %
minimum_nights_avg_ntm                            0.0 %
maximum_nights_avg_ntm                            0.0 %
has_availability                                  0.0 %
availability_30                                   0.0 %
availability_60                                   0.0 %
availability_90                                   0.0 %
availability_365                                  0.0 %
number_of_reviews                                 0.0 %
number_of_reviews_ltm                             0.0 %
number_of_reviews_l30d                            0.0 %
review_scores_rating                            22.08 %
review_scores_accuracy                          22.08 %
review_scores_cleanliness                       22.09 %
review_scores_checkin                           22.08 %
review_scores_communication                     22.08 %
review_scores_location                          22.08 %
review_scores_value                             22.08 %
instant_bookable                                  0.0 %
calculated_host_listings_count                    0.0 %
calculated_host_listings_count_entire_homes       0.0 %
calculated_host_listings_count_private_rooms      0.0 %
calculated_host_listings_count_shared_rooms       0.0 %
reviews_per_month                               22.08 %
dtype: object
In [70]:
clean_data.shape
Out[70]:
(19853, 50)

Now, we will move forward and deal with the features which carries missing values > 5%

  • host_response_time
  • host_response_rate
  • host_acceptance_rate
  • beds
  • review_scores_rating
  • review_scores_accuracy
  • review_scores_cleanliness
  • review_scores_checkin
  • review_scores_communication
  • review_scores_location
  • review_scores_value
  • reviews_per_month

We segregated the process of imputing the features (<5% & >5%) for more clarity and refined version of the code but the process is still the same. We will run the above logic of ours again and check the accuracy accordingly and decide the imputation method.

host_response_time

In [71]:
visualize_feature('host_response_time', clean_data)
No description has been provided for this image
In [72]:
mean_imp, med_imp, mult_imp, feature_drop, rows_drop = impute_feature('host_response_time', clean_data)

print(f"""Mean:- {model_check(mean_imp)}
Median:- {model_check(med_imp)}
Multiple:- {model_check(mult_imp)}
Feature Drop:- {model_check(feature_drop)}
Rows Drop:- {model_check(rows_drop)}""")
Mean:- MSE: 8922.610786321413, MAE: 51.716618671621, RMSE: 94.45957223236518, R2: 0.6475003904556498
Median:- MSE: 8922.610786321413, MAE: 51.716618671621, RMSE: 94.45957223236518, R2: 0.6475003904556498
Multiple:- MSE: 8922.610786321413, MAE: 51.716618671621, RMSE: 94.45957223236518, R2: 0.6475003904556498
Feature Drop:- MSE: 9136.766535415698, MAE: 52.369986065954485, RMSE: 95.58643489227798, R2: 0.6390398826799277
Rows Drop:- MSE: 8922.610786321413, MAE: 51.716618671621, RMSE: 94.45957223236518, R2: 0.6475003904556498
In [73]:
clean_data = mult_imp.copy()

host_response_rate

In [74]:
visualize_feature('host_response_rate', clean_data)
No description has been provided for this image
In [75]:
mean_imp, med_imp, mult_imp, feature_drop, rows_drop = impute_feature('host_response_rate', clean_data)

print(f"""Mean:- {model_check(mean_imp)}
Median:- {model_check(med_imp)}
Multiple:- {model_check(mult_imp)}
Feature Drop:- {model_check(feature_drop)}
Rows Drop:- {model_check(rows_drop)}""")
Mean:- MSE: 61299.810237428566, MAE: 57.039771428571434, RMSE: 247.58798484059878, R2: 0.2825718038541226
Median:- MSE: 60836.26963080219, MAE: 57.09521098901098, RMSE: 246.65009554184687, R2: 0.2879968957094523
Multiple:- MSE: 61299.810237428566, MAE: 57.039771428571434, RMSE: 247.58798484059878, R2: 0.2825718038541226
Feature Drop:- MSE: 60246.38296568132, MAE: 57.316863736263734, RMSE: 245.4513861555508, R2: 0.29490069075300973
Rows Drop:- MSE: 8922.610786321413, MAE: 51.716618671621, RMSE: 94.45957223236518, R2: 0.6475003904556498
In [76]:
clean_data = rows_drop.copy()

host_acceptance_rate

In [77]:
visualize_feature('host_acceptance_rate', clean_data)
No description has been provided for this image
In [78]:
mean_imp, med_imp, mult_imp, feature_drop, rows_drop = impute_feature('host_acceptance_rate', clean_data)

print(f"""Mean:- {model_check(mean_imp)}
Median:- {model_check(med_imp)}
Multiple:- {model_check(mult_imp)}
Feature Drop:- {model_check(feature_drop)}
Rows Drop:- {model_check(rows_drop)}""")
Mean:- MSE: 10192.225307390512, MAE: 51.38265054744526, RMSE: 100.95655158230451, R2: 0.6327382915757953
Median:- MSE: 10223.960468476278, MAE: 51.263270985401455, RMSE: 101.1136017975637, R2: 0.6315947621574429
Multiple:- MSE: 10192.225307390512, MAE: 51.38265054744526, RMSE: 100.95655158230451, R2: 0.6327382915757953
Feature Drop:- MSE: 10292.263703695255, MAE: 51.86634580291971, RMSE: 101.45079449514063, R2: 0.6291335564736134
Rows Drop:- MSE: 8922.610786321413, MAE: 51.716618671621, RMSE: 94.45957223236518, R2: 0.6475003904556498
In [79]:
clean_data = rows_drop.copy()

beds

In [80]:
visualize_feature('beds', clean_data)
No description has been provided for this image
In [81]:
mean_imp, med_imp, mult_imp, feature_drop, rows_drop = impute_feature('beds', clean_data)

print(f"""Mean:- {model_check(mean_imp)}
Median:- {model_check(med_imp)}
Multiple:- {model_check(mult_imp)}
Feature Drop:- {model_check(feature_drop)}
Rows Drop:- {model_check(rows_drop)}""")
Mean:- MSE: 32286.720948833754, MAE: 62.22030958439356, RMSE: 179.6850604497596, R2: 0.4253728795932271
Median:- MSE: 33354.02798757421, MAE: 63.586925360474964, RMSE: 182.63085168605608, R2: 0.40637734358840205
Multiple:- MSE: 32286.720948833754, MAE: 62.22030958439356, RMSE: 179.6850604497596, R2: 0.4253728795932271
Feature Drop:- MSE: 33077.59752327184, MAE: 62.44091391009329, RMSE: 181.87247599148102, R2: 0.41129715077311146
Rows Drop:- MSE: 8922.610786321413, MAE: 51.716618671621, RMSE: 94.45957223236518, R2: 0.6475003904556498
In [82]:
clean_data = rows_drop.copy()

review_scores_rating

In [83]:
visualize_feature('review_scores_rating', clean_data)
No description has been provided for this image
In [84]:
mean_imp, med_imp, mult_imp, feature_drop, rows_drop = impute_feature('review_scores_rating', clean_data)

print(f"""Mean:- {model_check(mean_imp)}
Median:- {model_check(med_imp)}
Multiple:- {model_check(mult_imp)}
Feature Drop:- {model_check(feature_drop)}
Rows Drop:- {model_check(rows_drop)}""")
Mean:- MSE: 8922.610786321413, MAE: 51.716618671621, RMSE: 94.45957223236518, R2: 0.6475003904556498
Median:- MSE: 8922.610786321413, MAE: 51.716618671621, RMSE: 94.45957223236518, R2: 0.6475003904556498
Multiple:- MSE: 8922.610786321413, MAE: 51.716618671621, RMSE: 94.45957223236518, R2: 0.6475003904556498
Feature Drop:- MSE: 8654.793400905713, MAE: 51.13475150952159, RMSE: 93.03114210255464, R2: 0.6580808725644232
Rows Drop:- MSE: 8922.610786321413, MAE: 51.716618671621, RMSE: 94.45957223236518, R2: 0.6475003904556498
In [85]:
clean_data = feature_drop.copy()

We observed another feature which boosted our R2 a bit again but the method had to be feature drop and not imputing.

review_scores_accuracy

In [86]:
visualize_feature('review_scores_accuracy', clean_data)
No description has been provided for this image
In [87]:
mean_imp, med_imp, mult_imp, feature_drop, rows_drop = impute_feature('review_scores_accuracy', clean_data)

print(f"""Mean:- {model_check(mean_imp)}
Median:- {model_check(med_imp)}
Multiple:- {model_check(mult_imp)}
Feature Drop:- {model_check(feature_drop)}
Rows Drop:- {model_check(rows_drop)}""")
Mean:- MSE: 8654.793400905713, MAE: 51.13475150952159, RMSE: 93.03114210255464, R2: 0.6580808725644232
Median:- MSE: 8654.793400905713, MAE: 51.13475150952159, RMSE: 93.03114210255464, R2: 0.6580808725644232
Multiple:- MSE: 8654.793400905713, MAE: 51.13475150952159, RMSE: 93.03114210255464, R2: 0.6580808725644232
Feature Drop:- MSE: 9597.270765873202, MAE: 52.24906874129122, RMSE: 97.9656611567196, R2: 0.6208470504117486
Rows Drop:- MSE: 8654.793400905713, MAE: 51.13475150952159, RMSE: 93.03114210255464, R2: 0.6580808725644232
In [88]:
clean_data = mult_imp.copy()

review_scores_cleanliness

In [89]:
visualize_feature('review_scores_cleanliness', clean_data)
No description has been provided for this image
In [90]:
mean_imp, med_imp, mult_imp, feature_drop, rows_drop = impute_feature('review_scores_cleanliness', clean_data)

print(f"""Mean:- {model_check(mean_imp)}
Median:- {model_check(med_imp)}
Multiple:- {model_check(mult_imp)}
Feature Drop:- {model_check(feature_drop)}
Rows Drop:- {model_check(rows_drop)}""")
Mean:- MSE: 8654.793400905713, MAE: 51.13475150952159, RMSE: 93.03114210255464, R2: 0.6580808725644232
Median:- MSE: 8654.793400905713, MAE: 51.13475150952159, RMSE: 93.03114210255464, R2: 0.6580808725644232
Multiple:- MSE: 8654.793400905713, MAE: 51.13475150952159, RMSE: 93.03114210255464, R2: 0.6580808725644232
Feature Drop:- MSE: 9162.663587888992, MAE: 51.75394101254064, RMSE: 95.72180309568448, R2: 0.6380167851691431
Rows Drop:- MSE: 8654.793400905713, MAE: 51.13475150952159, RMSE: 93.03114210255464, R2: 0.6580808725644232
In [91]:
clean_data = mult_imp.copy()

review_scores_checkin

In [92]:
visualize_feature('review_scores_checkin', clean_data)
No description has been provided for this image
In [93]:
mean_imp, med_imp, mult_imp, feature_drop, rows_drop = impute_feature('review_scores_checkin', clean_data)

print(f"""Mean:- {model_check(mean_imp)}
Median:- {model_check(med_imp)}
Multiple:- {model_check(mult_imp)}
Feature Drop:- {model_check(feature_drop)}
Rows Drop:- {model_check(rows_drop)}""")
Mean:- MSE: 8654.793400905713, MAE: 51.13475150952159, RMSE: 93.03114210255464, R2: 0.6580808725644232
Median:- MSE: 8654.793400905713, MAE: 51.13475150952159, RMSE: 93.03114210255464, R2: 0.6580808725644232
Multiple:- MSE: 8654.793400905713, MAE: 51.13475150952159, RMSE: 93.03114210255464, R2: 0.6580808725644232
Feature Drop:- MSE: 9066.90647520901, MAE: 51.788892243381326, RMSE: 95.22030495230001, R2: 0.6417997972985676
Rows Drop:- MSE: 8654.793400905713, MAE: 51.13475150952159, RMSE: 93.03114210255464, R2: 0.6580808725644232
In [94]:
clean_data = mult_imp.copy()

review_scores_communication

In [95]:
visualize_feature('review_scores_communication', clean_data)
No description has been provided for this image
In [96]:
mean_imp, med_imp, mult_imp, feature_drop, rows_drop = impute_feature('review_scores_communication', clean_data)

print(f"""Mean:- {model_check(mean_imp)}
Median:- {model_check(med_imp)}
Multiple:- {model_check(mult_imp)}
Feature Drop:- {model_check(feature_drop)}
Rows Drop:- {model_check(rows_drop)}""")
Mean:- MSE: 8654.793400905713, MAE: 51.13475150952159, RMSE: 93.03114210255464, R2: 0.6580808725644232
Median:- MSE: 8654.793400905713, MAE: 51.13475150952159, RMSE: 93.03114210255464, R2: 0.6580808725644232
Multiple:- MSE: 8654.793400905713, MAE: 51.13475150952159, RMSE: 93.03114210255464, R2: 0.6580808725644232
Feature Drop:- MSE: 9272.672662157456, MAE: 52.085520204366, RMSE: 96.29471772718095, R2: 0.6336707303366924
Rows Drop:- MSE: 8654.793400905713, MAE: 51.13475150952159, RMSE: 93.03114210255464, R2: 0.6580808725644232
In [97]:
clean_data = mult_imp.copy()

review_scores_location

In [98]:
visualize_feature('review_scores_location', clean_data)
No description has been provided for this image
In [99]:
mean_imp, med_imp, mult_imp, feature_drop, rows_drop = impute_feature('review_scores_location', clean_data)

print(f"""Mean:- {model_check(mean_imp)}
Median:- {model_check(med_imp)}
Multiple:- {model_check(mult_imp)}
Feature Drop:- {model_check(feature_drop)}
Rows Drop:- {model_check(rows_drop)}""")
Mean:- MSE: 8654.793400905713, MAE: 51.13475150952159, RMSE: 93.03114210255464, R2: 0.6580808725644232
Median:- MSE: 8654.793400905713, MAE: 51.13475150952159, RMSE: 93.03114210255464, R2: 0.6580808725644232
Multiple:- MSE: 8654.793400905713, MAE: 51.13475150952159, RMSE: 93.03114210255464, R2: 0.6580808725644232
Feature Drop:- MSE: 9165.49740116117, MAE: 51.61564328843474, RMSE: 95.73660429094595, R2: 0.6379048316057874
Rows Drop:- MSE: 8654.793400905713, MAE: 51.13475150952159, RMSE: 93.03114210255464, R2: 0.6580808725644232
In [100]:
clean_data = mult_imp.copy()

review_scores_value

In [101]:
visualize_feature('review_scores_value', clean_data)
No description has been provided for this image
In [102]:
mean_imp, med_imp, mult_imp, feature_drop, rows_drop = impute_feature('review_scores_value', clean_data)

print(f"""Mean:- {model_check(mean_imp)}
Median:- {model_check(med_imp)}
Multiple:- {model_check(mult_imp)}
Feature Drop:- {model_check(feature_drop)}
Rows Drop:- {model_check(rows_drop)}""")
Mean:- MSE: 8654.793400905713, MAE: 51.13475150952159, RMSE: 93.03114210255464, R2: 0.6580808725644232
Median:- MSE: 8654.793400905713, MAE: 51.13475150952159, RMSE: 93.03114210255464, R2: 0.6580808725644232
Multiple:- MSE: 8654.793400905713, MAE: 51.13475150952159, RMSE: 93.03114210255464, R2: 0.6580808725644232
Feature Drop:- MSE: 9447.420761054344, MAE: 52.02047840222944, RMSE: 97.19784339713686, R2: 0.6267670742090259
Rows Drop:- MSE: 8654.793400905713, MAE: 51.13475150952159, RMSE: 93.03114210255464, R2: 0.6580808725644232
In [103]:
clean_data = mult_imp.copy()

reviews_per_month

In [104]:
visualize_feature('reviews_per_month', clean_data)
No description has been provided for this image
In [105]:
mean_imp, med_imp, mult_imp, feature_drop, rows_drop = impute_feature('reviews_per_month', clean_data)

print(f"""Mean:- {model_check(mean_imp)}
Median:- {model_check(med_imp)}
Multiple:- {model_check(mult_imp)}
Feature Drop:- {model_check(feature_drop)}
Rows Drop:- {model_check(rows_drop)}""")
Mean:- MSE: 46768.35899868723, MAE: 56.3973073974703, RMSE: 216.25993387284487, R2: 0.28517903402287204
Median:- MSE: 46311.822850325785, MAE: 55.6945611345343, RMSE: 215.2018188824755, R2: 0.2921568629987479
Multiple:- MSE: 46768.35899868723, MAE: 56.3973073974703, RMSE: 216.25993387284487, R2: 0.28517903402287204
Feature Drop:- MSE: 46549.10039349367, MAE: 56.307974319662705, RMSE: 215.7524053017571, R2: 0.28853024521177983
Rows Drop:- MSE: 8654.793400905713, MAE: 51.13475150952159, RMSE: 93.03114210255464, R2: 0.6580808725644232
In [106]:
clean_data = rows_drop.copy()

So, at last we are done with the imputation, rows dropping, features dropping.

In [107]:
clean_data.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10763 entries, 0 to 10762
Data columns (total 49 columns):
 #   Column                                        Non-Null Count  Dtype  
---  ------                                        --------------  -----  
 0   host_since                                    10763 non-null  float64
 1   host_response_time                            10763 non-null  float64
 2   host_response_rate                            10763 non-null  float64
 3   host_acceptance_rate                          10763 non-null  float64
 4   host_is_superhost                             10763 non-null  float64
 5   host_neighbourhood                            10763 non-null  int32  
 6   host_listings_count                           10763 non-null  float64
 7   host_total_listings_count                     10763 non-null  float64
 8   host_verifications                            10763 non-null  float64
 9   host_identity_verified                        10763 non-null  float64
 10  neighbourhood_cleansed                        10763 non-null  int32  
 11  latitude                                      10763 non-null  float64
 12  longitude                                     10763 non-null  float64
 13  property_type                                 10763 non-null  int32  
 14  room_type                                     10763 non-null  int64  
 15  accommodates                                  10763 non-null  int64  
 16  bathrooms                                     10763 non-null  float64
 17  bedrooms                                      10763 non-null  float64
 18  beds                                          10763 non-null  float64
 19  amenities                                     10763 non-null  float64
 20  price                                         10763 non-null  float64
 21  minimum_nights                                10763 non-null  int64  
 22  maximum_nights                                10763 non-null  int64  
 23  minimum_minimum_nights                        10763 non-null  int64  
 24  maximum_minimum_nights                        10763 non-null  int64  
 25  minimum_maximum_nights                        10763 non-null  int64  
 26  maximum_maximum_nights                        10763 non-null  int64  
 27  minimum_nights_avg_ntm                        10763 non-null  float64
 28  maximum_nights_avg_ntm                        10763 non-null  float64
 29  has_availability                              10763 non-null  float64
 30  availability_30                               10763 non-null  int64  
 31  availability_60                               10763 non-null  int64  
 32  availability_90                               10763 non-null  int64  
 33  availability_365                              10763 non-null  int64  
 34  number_of_reviews                             10763 non-null  int64  
 35  number_of_reviews_ltm                         10763 non-null  int64  
 36  number_of_reviews_l30d                        10763 non-null  int64  
 37  review_scores_accuracy                        10763 non-null  float64
 38  review_scores_cleanliness                     10763 non-null  float64
 39  review_scores_checkin                         10763 non-null  float64
 40  review_scores_communication                   10763 non-null  float64
 41  review_scores_location                        10763 non-null  float64
 42  review_scores_value                           10763 non-null  float64
 43  instant_bookable                              10763 non-null  int64  
 44  calculated_host_listings_count                10763 non-null  int64  
 45  calculated_host_listings_count_entire_homes   10763 non-null  int64  
 46  calculated_host_listings_count_private_rooms  10763 non-null  int64  
 47  calculated_host_listings_count_shared_rooms   10763 non-null  int64  
 48  reviews_per_month                             10763 non-null  float64
dtypes: float64(26), int32(3), int64(20)
memory usage: 3.9 MB

Feature Selection¶

We'll remove 1 feature at a time and run the base model again and will try to get high r2. If we found any increase in the r2 value, we can permanently remove that feature and decide to go further:

In [108]:
feat_sel = clean_data.copy()
In [109]:
for feature in feat_sel.columns:
    if feature != 'price':
        print(f'{feature}:-')
        _, _, _, feature_drop, _ = impute_feature(feature, feat_sel)
        print(f"""Feature Drop:- {model_check(feature_drop)}\n""")
host_since:-
Feature Drop:- MSE: 9625.765810067349, MAE: 52.53617510450535, RMSE: 98.11098720361215, R2: 0.6197213157817247

host_response_time:-
Feature Drop:- MSE: 9403.362930631678, MAE: 52.271658151416624, RMSE: 96.97093858796912, R2: 0.6285076374133707

host_response_rate:-
Feature Drop:- MSE: 9144.446175406409, MAE: 51.62478402229448, RMSE: 95.62659763583774, R2: 0.638736488285283

host_acceptance_rate:-
Feature Drop:- MSE: 9301.262796202971, MAE: 52.29496284254528, RMSE: 96.44305468100319, R2: 0.6325412390555849

host_is_superhost:-
Feature Drop:- MSE: 8913.793923049234, MAE: 51.40878077101719, RMSE: 94.41289066144111, R2: 0.6478487123689638

host_neighbourhood:-
Feature Drop:- MSE: 8251.50850817464, MAE: 50.50591267998142, RMSE: 90.83781430755938, R2: 0.6740131787724637

host_listings_count:-
Feature Drop:- MSE: 8658.8173058523, MAE: 51.08977705527171, RMSE: 93.05276624503057, R2: 0.6579219028462001

host_total_listings_count:-
Feature Drop:- MSE: 9201.8425395843, MAE: 51.70421040408733, RMSE: 95.92623488693957, R2: 0.6364689685596676

host_verifications:-
Feature Drop:- MSE: 9075.372496980957, MAE: 51.85198792382722, RMSE: 95.26474949833731, R2: 0.6414653358454714

host_identity_verified:-
Feature Drop:- MSE: 8898.117051799814, MAE: 51.51005341384115, RMSE: 94.32983118716906, R2: 0.6484680480238116

neighbourhood_cleansed:-
Feature Drop:- MSE: 9098.23823848119, MAE: 52.09086391082211, RMSE: 95.38468555528812, R2: 0.6405619943074621

latitude:-
Feature Drop:- MSE: 9551.209884486763, MAE: 54.0278959591268, RMSE: 97.73029153996607, R2: 0.6226667468092288

longitude:-
Feature Drop:- MSE: 8800.232519925687, MAE: 52.409958197863446, RMSE: 93.80955452365014, R2: 0.652335106678769

property_type:-
Feature Drop:- MSE: 9115.67557793776, MAE: 52.00793776126335, RMSE: 95.47604714239986, R2: 0.6398731090140055

room_type:-
Feature Drop:- MSE: 8835.153132535996, MAE: 51.15077101718533, RMSE: 93.99549527789083, R2: 0.6509555214200402

accommodates:-
Feature Drop:- MSE: 9094.510992835578, MAE: 52.97577566186716, RMSE: 95.36514558703078, R2: 0.6407092441053315

bathrooms:-
Feature Drop:- MSE: 9143.072434788668, MAE: 53.109312587087786, RMSE: 95.61941452858132, R2: 0.638790759735979

bedrooms:-
Feature Drop:- MSE: 9725.2821998026, MAE: 52.14912447747329, RMSE: 98.61684541599675, R2: 0.6157897884109784

beds:-
Feature Drop:- MSE: 8378.359680933581, MAE: 50.89205294937297, RMSE: 91.53338014589859, R2: 0.6690017544328166

amenities:-
Feature Drop:- MSE: 8540.55818023688, MAE: 51.478077101718526, RMSE: 92.41514042751263, R2: 0.6625938869327859

minimum_nights:-
Feature Drop:- MSE: 9442.976488829541, MAE: 51.64024616813748, RMSE: 97.17497871792688, R2: 0.6269426510958216

maximum_nights:-
Feature Drop:- MSE: 9146.358621110077, MAE: 51.686841616349284, RMSE: 95.63659666210461, R2: 0.6386609345734868

minimum_minimum_nights:-
Feature Drop:- MSE: 9562.209349117513, MAE: 52.44162563864375, RMSE: 97.78654993974126, R2: 0.6222321983255631

maximum_minimum_nights:-
Feature Drop:- MSE: 9389.062629958196, MAE: 51.79325592196935, RMSE: 96.89717555201594, R2: 0.629072589816259

minimum_maximum_nights:-
Feature Drop:- MSE: 9847.084164015328, MAE: 52.532635856943806, RMSE: 99.23247534963203, R2: 0.6109778397722971

maximum_maximum_nights:-
Feature Drop:- MSE: 9303.205598618208, MAE: 52.211377148165354, RMSE: 96.45312643257454, R2: 0.6324644860615126

minimum_nights_avg_ntm:-
Feature Drop:- MSE: 9766.170818218763, MAE: 52.38351370181143, RMSE: 98.82393848769013, R2: 0.6141744291431953

maximum_nights_avg_ntm:-
Feature Drop:- MSE: 9860.072622944726, MAE: 53.137524384579656, RMSE: 99.29789838130878, R2: 0.6104647134227522

has_availability:-
Feature Drop:- MSE: 9184.65209803762, MAE: 52.02367162099396, RMSE: 95.83659060107273, R2: 0.6371480998227268

availability_30:-
Feature Drop:- MSE: 9151.738380712959, MAE: 51.75203669298653, RMSE: 95.6647185785489, R2: 0.63844840001327

availability_60:-
Feature Drop:- MSE: 9180.875241848584, MAE: 51.3397631212262, RMSE: 95.81688390804923, R2: 0.637297309551115

availability_90:-
Feature Drop:- MSE: 10003.288638248954, MAE: 52.439739897817006, RMSE: 100.01644183957433, R2: 0.6048067742069533

availability_365:-
Feature Drop:- MSE: 10043.63491254064, MAE: 52.48329307942406, RMSE: 100.21793707984934, R2: 0.6032128409653296

number_of_reviews:-
Feature Drop:- MSE: 9010.185795529493, MAE: 51.500550394797955, RMSE: 94.92199848048656, R2: 0.64404062320916

number_of_reviews_ltm:-
Feature Drop:- MSE: 8918.248273153738, MAE: 51.33523455643288, RMSE: 94.43647744994377, R2: 0.6476727373421232

number_of_reviews_l30d:-
Feature Drop:- MSE: 9217.58097159777, MAE: 51.88913608917789, RMSE: 96.00823387396402, R2: 0.6358472008649372

review_scores_accuracy:-
Feature Drop:- MSE: 9597.270765873202, MAE: 52.24906874129122, RMSE: 97.9656611567196, R2: 0.6208470504117486

review_scores_cleanliness:-
Feature Drop:- MSE: 9162.663587888992, MAE: 51.75394101254064, RMSE: 95.72180309568448, R2: 0.6380167851691431

review_scores_checkin:-
Feature Drop:- MSE: 9066.90647520901, MAE: 51.788892243381326, RMSE: 95.22030495230001, R2: 0.6417997972985676

review_scores_communication:-
Feature Drop:- MSE: 9272.672662157456, MAE: 52.085520204366, RMSE: 96.29471772718095, R2: 0.6336707303366924

review_scores_location:-
Feature Drop:- MSE: 9165.49740116117, MAE: 51.61564328843474, RMSE: 95.73660429094595, R2: 0.6379048316057874

review_scores_value:-
Feature Drop:- MSE: 9447.420761054344, MAE: 52.02047840222944, RMSE: 97.19784339713686, R2: 0.6267670742090259

instant_bookable:-
Feature Drop:- MSE: 9121.619609823501, MAE: 51.83424059451927, RMSE: 95.50717046286891, R2: 0.6396382821265596

calculated_host_listings_count:-
Feature Drop:- MSE: 9430.414378251277, MAE: 52.14480492336275, RMSE: 97.11032065775129, R2: 0.6274389339865513

calculated_host_listings_count_entire_homes:-
Feature Drop:- MSE: 9238.65334006038, MAE: 52.43232698560148, RMSE: 96.11791373131432, R2: 0.6350147089146403

calculated_host_listings_count_private_rooms:-
Feature Drop:- MSE: 9124.77942595216, MAE: 51.79639804923362, RMSE: 95.52371132840348, R2: 0.6395134493866522

calculated_host_listings_count_shared_rooms:-
Feature Drop:- MSE: 9426.582823513703, MAE: 51.938678588016714, RMSE: 97.09059080834612, R2: 0.6275903046538691

reviews_per_month:-
Feature Drop:- MSE: 9381.561797770553, MAE: 51.587148165350676, RMSE: 96.85846270600496, R2: 0.629368920170762

From the above observation, we almost got similar score except some features. Those features are as follow:-

  • host_neighbourhood
    • R-Square: 0.6740131787724637
  • beds
    • R-Square: 0.6690017544328166
  • amenities
    • R-Square: 0.6625938869327859

So, according to the above observation, we can go ahead and remove the feature which is giving us the highest r2 which is host_neighbourhood
We will drop the feature and check the base model again if we experience any boost in the r2 value anymore or not.

In [110]:
feat_sel = feat_sel.drop(columns = ['host_neighbourhood'])
model_check(feat_sel)
Out[110]:
'MSE: 8251.50850817464, MAE: 50.50591267998142, RMSE: 90.83781430755938, R2: 0.6740131787724637'
In [111]:
for feature in feat_sel.columns:
    if feature != 'price':
        print(f'{feature}:-')
        _, _, _, feature_drop, _ = impute_feature(feature, feat_sel)
        print(f"""Feature Drop:- {model_check(feature_drop)}\n""")
host_since:-
Feature Drop:- MSE: 9318.468203913146, MAE: 51.94708546214585, RMSE: 96.53221329645946, R2: 0.6318615165343264

host_response_time:-
Feature Drop:- MSE: 9123.627227264284, MAE: 51.74047840222945, RMSE: 95.51768018154694, R2: 0.6395589685286747

host_response_rate:-
Feature Drop:- MSE: 9094.911053843474, MAE: 51.50447515095216, RMSE: 95.36724308610097, R2: 0.6406934391629828

host_acceptance_rate:-
Feature Drop:- MSE: 8797.917992487228, MAE: 51.00088016720854, RMSE: 93.79721740268859, R2: 0.652426545164418

host_is_superhost:-
Feature Drop:- MSE: 9414.410384463537, MAE: 51.84169066418951, RMSE: 97.02788457172267, R2: 0.6280711930524714

host_listings_count:-
Feature Drop:- MSE: 9342.664672909894, MAE: 52.30148397584765, RMSE: 96.65746051345387, R2: 0.6309056028361988

host_total_listings_count:-
Feature Drop:- MSE: 9399.028499187181, MAE: 52.159577333952626, RMSE: 96.94858688597364, R2: 0.6286788748940109

host_verifications:-
Feature Drop:- MSE: 8890.64937882025, MAE: 51.55482117974918, RMSE: 94.29024010373635, R2: 0.648763068379685

host_identity_verified:-
Feature Drop:- MSE: 9448.848178100323, MAE: 51.73062703204831, RMSE: 97.20518596299439, R2: 0.6267106821996226

neighbourhood_cleansed:-
Feature Drop:- MSE: 9210.063616569902, MAE: 51.76951463074779, RMSE: 95.96907635571941, R2: 0.6361441839762247

latitude:-
Feature Drop:- MSE: 9091.143155794241, MAE: 53.721985601486296, RMSE: 95.34748636327149, R2: 0.6408422949881405

longitude:-
Feature Drop:- MSE: 8962.71203493962, MAE: 52.4985578262889, RMSE: 94.67160099491092, R2: 0.6459161372792331

property_type:-
Feature Drop:- MSE: 9511.673690501626, MAE: 51.99487227124942, RMSE: 97.5278098313585, R2: 0.6242286767506307

room_type:-
Feature Drop:- MSE: 9204.469667928472, MAE: 51.801490942870416, RMSE: 95.93992739171982, R2: 0.6363651803593616

accommodates:-
Feature Drop:- MSE: 9400.587022050628, MAE: 53.26331398049234, RMSE: 96.95662443613962, R2: 0.6286173033748683

bathrooms:-
Feature Drop:- MSE: 9605.223566500232, MAE: 54.08470274036228, RMSE: 98.00624248740604, R2: 0.6205328644427595

bedrooms:-
Feature Drop:- MSE: 9915.770128588018, MAE: 52.89165815141662, RMSE: 99.57796005436151, R2: 0.6082643093636675

beds:-
Feature Drop:- MSE: 8816.090090118438, MAE: 51.31010915002323, RMSE: 93.89403649922842, R2: 0.6517086322717675

amenities:-
Feature Drop:- MSE: 8816.516199326521, MAE: 51.5046214584301, RMSE: 93.89630556803884, R2: 0.6516917982606165

minimum_nights:-
Feature Drop:- MSE: 8900.99473643753, MAE: 51.183562470970735, RMSE: 94.34508326583601, R2: 0.648354361263795

maximum_nights:-
Feature Drop:- MSE: 8994.315453274501, MAE: 51.67609382257316, RMSE: 94.83836488085663, R2: 0.6446676021934721

minimum_minimum_nights:-
Feature Drop:- MSE: 9435.714597375754, MAE: 52.44083604273108, RMSE: 97.1376065042564, R2: 0.6272295417787523

maximum_minimum_nights:-
Feature Drop:- MSE: 9548.531008546213, MAE: 52.02593125870878, RMSE: 97.71658512528062, R2: 0.6227725793671736

minimum_maximum_nights:-
Feature Drop:- MSE: 9005.201583534603, MAE: 51.503669298653044, RMSE: 94.89574059742937, R2: 0.644237531134896

maximum_maximum_nights:-
Feature Drop:- MSE: 8995.77228597306, MAE: 51.40321411983279, RMSE: 94.84604517834711, R2: 0.6446100480795796

minimum_nights_avg_ntm:-
Feature Drop:- MSE: 9376.265891105435, MAE: 51.997756618671616, RMSE: 96.83112046808833, R2: 0.6295781420091165

maximum_nights_avg_ntm:-
Feature Drop:- MSE: 9035.958946969344, MAE: 51.36028564793312, RMSE: 95.05766116925739, R2: 0.6430224205735364

has_availability:-
Feature Drop:- MSE: 9279.659596017187, MAE: 51.75809800278681, RMSE: 96.33098980087969, R2: 0.6333947022193118

availability_30:-
Feature Drop:- MSE: 9729.831380213656, MAE: 52.53590803529958, RMSE: 98.6399076449976, R2: 0.6156100669867174

availability_60:-
Feature Drop:- MSE: 9394.82174004877, MAE: 51.73112633534603, RMSE: 96.92688863286993, R2: 0.6288450685103468

availability_90:-
Feature Drop:- MSE: 9542.934187923827, MAE: 51.92574082675336, RMSE: 97.68794289943784, R2: 0.6229936892117389

availability_365:-
Feature Drop:- MSE: 9579.147125290294, MAE: 52.31841151881097, RMSE: 97.87311748018602, R2: 0.6215630489442399

number_of_reviews:-
Feature Drop:- MSE: 9436.110601730144, MAE: 52.007954017649794, RMSE: 97.13964485075157, R2: 0.6272138970998973

number_of_reviews_ltm:-
Feature Drop:- MSE: 9329.351991546679, MAE: 52.18792382721784, RMSE: 96.5885707086852, R2: 0.6314315380243295

number_of_reviews_l30d:-
Feature Drop:- MSE: 8974.993729737575, MAE: 51.948293079424054, RMSE: 94.73644351429694, R2: 0.6454309325869632

review_scores_accuracy:-
Feature Drop:- MSE: 9183.790988132838, MAE: 52.20270320483046, RMSE: 95.8320979011356, R2: 0.6371821191151155

review_scores_cleanliness:-
Feature Drop:- MSE: 9540.365933139805, MAE: 52.076479331165814, RMSE: 97.67479681647566, R2: 0.6230951515337273

review_scores_checkin:-
Feature Drop:- MSE: 9159.559443892244, MAE: 52.07845796562936, RMSE: 95.70558731804661, R2: 0.6381394185074145

review_scores_communication:-
Feature Drop:- MSE: 9112.987361762656, MAE: 51.2015490013934, RMSE: 95.46196814314409, R2: 0.6399793105703423

review_scores_location:-
Feature Drop:- MSE: 9514.154059800278, MAE: 52.03165350673479, RMSE: 97.54052521798454, R2: 0.6241306864616627

review_scores_value:-
Feature Drop:- MSE: 9158.146902728751, MAE: 51.818855085926614, RMSE: 95.69820741648587, R2: 0.6381952228252896

instant_bookable:-
Feature Drop:- MSE: 8998.674341558291, MAE: 51.62675104505342, RMSE: 94.86134271429164, R2: 0.6444953985128589

calculated_host_listings_count:-
Feature Drop:- MSE: 9173.602801823039, MAE: 51.90013237343242, RMSE: 95.77892671054025, R2: 0.6375846169694066

calculated_host_listings_count_entire_homes:-
Feature Drop:- MSE: 9574.735241581515, MAE: 52.15449837436136, RMSE: 97.85057609223114, R2: 0.6217373462796212

calculated_host_listings_count_private_rooms:-
Feature Drop:- MSE: 9041.59392575476, MAE: 51.91689270784951, RMSE: 95.08729634264905, R2: 0.6427998032399759

calculated_host_listings_count_shared_rooms:-
Feature Drop:- MSE: 9348.178170901067, MAE: 51.49002786809103, RMSE: 96.68597711613131, R2: 0.6306877847630299

reviews_per_month:-
Feature Drop:- MSE: 8586.394411483976, MAE: 50.614781699953554, RMSE: 92.66279950165533, R2: 0.6607830656379294

Removing Outliers¶

In [112]:
# Getting Q1 (25th percentile) and Q3 (75th percentile)
Q1 = feat_sel['price'].quantile(0.25)
Q3 = feat_sel['price'].quantile(0.75)

# Calculating the IQR
IQR = Q3 - Q1

# Defining the outlier bounds
lower_bound = Q1 - 1.5 * IQR
upper_bound = Q3 + 1.5 * IQR

# Filtering out the outliers
filtered_data = feat_sel[(feat_sel['price'] >= lower_bound) & (feat_sel['price'] <= upper_bound)]

filtered_data.head(5)
Out[112]:
host_since host_response_time host_response_rate host_acceptance_rate host_is_superhost host_listings_count host_total_listings_count host_verifications host_identity_verified neighbourhood_cleansed ... review_scores_checkin review_scores_communication review_scores_location review_scores_value instant_bookable calculated_host_listings_count calculated_host_listings_count_entire_homes calculated_host_listings_count_private_rooms calculated_host_listings_count_shared_rooms reviews_per_month
0 176.0 2.0 100.0 38.0 1.0 5.0 10.0 7.0 1.0 122 ... 4.64 4.76 4.86 4.67 0 5 5 0 0 0.25
1 172.0 1.0 77.0 62.0 1.0 9.0 19.0 6.0 1.0 104 ... 4.79 4.84 4.95 4.23 0 9 7 2 0 0.40
2 172.0 1.0 77.0 62.0 1.0 9.0 19.0 6.0 1.0 6 ... 4.63 4.69 4.92 4.21 0 9 7 2 0 0.53
3 172.0 1.0 77.0 62.0 1.0 9.0 19.0 6.0 1.0 104 ... 4.50 4.80 4.85 4.25 0 9 7 2 0 0.14
4 172.0 1.0 77.0 62.0 1.0 9.0 19.0 6.0 1.0 23 ... 4.80 4.86 4.92 4.50 0 9 7 2 0 0.36

5 rows × 48 columns

Splitting the Data¶

In [113]:
model_data = filtered_data.copy()
In [114]:
# Separating features (X) and target (y)

X = model_data.drop('price', axis = 1)
y = model_data['price']
In [115]:
# Splitting the data (test & train)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 42)
In [116]:
# Printing the shapes

print("Training Features Shape:", X_train.shape)
print("Testing Features Shape:", X_test.shape)
print("Training Target Shape:", y_train.shape)
print("Testing Target Shape:", y_test.shape)
Training Features Shape: (8176, 47)
Testing Features Shape: (2045, 47)
Training Target Shape: (8176,)
Testing Target Shape: (2045,)

Models¶

1. Linear Regression¶

In [117]:
if os.path.exists('Models/PricePrediction/lr.pkl'):
    print('Model Exists. Loading the model')

    # Loading the model
    lr_model = joblib.load('Models/PricePrediction/lr.pkl')
else:
    print('Model not Found. Creating & Saving the model')
    
    # Creating Linear Regression object
    lr_model = LinearRegression()
    
    # Fitting/Training the model
    lr_model.fit(X_train, y_train)

    # Saving the model
    joblib.dump(lr_model, 'Models/PricePrediction/lr.pkl')


# Predicting
y_pred_lr = lr_model.predict(X_test)
Model Exists. Loading the model
In [118]:
# Metrics

mse_lr = mean_squared_error(y_test, y_pred_lr)
mae_lr = mean_absolute_error(y_test, y_pred_lr)
rmse_lr = np.sqrt(mse_lr)
r2_lr = r2_score(y_test, y_pred_lr)

print(f"Metrics:\nMSE: {mse_lr}, MAE: {mae_lr}, RMSE: {rmse_lr}, R2: {r2_lr}")
Metrics:
MSE: 3996.6172754860736, MAE: 47.69454383549999, RMSE: 63.21880476160613, R2: 0.6132730441656875

2. Decision Tree¶

In [119]:
if os.path.exists('Models/PricePrediction/dt.pkl'):
    print('Model Exists. Loading the model')

    # Loading the model
    dt_model = joblib.load('Models/PricePrediction/dt.pkl')
else:
    print('Model not Found. Creating & Saving the model')
    
    # Creating decision tree object
    dt_model = DecisionTreeRegressor(random_state = 42, max_depth = 6)
    
    # Fitting/Training the model
    dt_model.fit(X_train, y_train)

    # Saving the model
    joblib.dump(dt_model, 'Models/PricePrediction/dt.pkl')


# Predicting
y_pred_dt = dt_model.predict(X_test)
Model Exists. Loading the model
In [120]:
# Metrics

mse_dt = mean_squared_error(y_test, y_pred_dt)
mae_dt = mean_absolute_error(y_test, y_pred_dt)
rmse_dt = np.sqrt(mse_dt)
r2_dt = r2_score(y_test, y_pred_dt)

print(f"Metrics:\nMSE: {mse_dt}, MAE: {mae_dt}, RMSE: {rmse_dt}, R2: {r2_dt}")
Metrics:
MSE: 3435.458525837511, MAE: 42.11962659151027, RMSE: 58.6127846620301, R2: 0.6675727681654002

3. Random Forest¶

In [121]:
if os.path.exists('Models/PricePrediction/rf.pkl'):
    print('Model Exists. Loading the model')

    # Loading the model
    rf_model = joblib.load('Models/PricePrediction/rf.pkl')
else:
    print('Model not Found. Creating & Saving the model')
    
    # Creating random forest object
    rf_model = RandomForestRegressor(n_estimators = 145, random_state = 42, n_jobs = -1, max_features = 12, bootstrap = False)
    
    # Fitting/Training the model
    rf_model.fit(X_train, y_train)

    # Saving the model
    joblib.dump(rf_model, 'Models/PricePrediction/rf.pkl')


# Predicting
y_pred_rf = rf_model.predict(X_test)
Model Exists. Loading the model
In [122]:
# Metrics

mse_rf = mean_squared_error(y_test, y_pred_rf)
mae_rf = mean_absolute_error(y_test, y_pred_rf)
rmse_rf = np.sqrt(mse_rf)
r2_rf = r2_score(y_test, y_pred_rf)

print(f"Metrics:\nMSE: {mse_rf}, MAE: {mae_rf}, RMSE: {rmse_rf}, R2: {r2_rf}")
Metrics:
MSE: 2435.355771177054, MAE: 34.10725908439423, RMSE: 49.34932391813543, R2: 0.7643462811569113

4. XGBoost¶

In [123]:
if os.path.exists('Models/PricePrediction/xgb.pkl'):
    print('Model Exists. Loading the model')

    # Loading the model
    xgb_model = joblib.load('Models/PricePrediction/xgb.pkl')
else:
    print('Model not Found. Creating & Saving the model')
    
    # Initializing the Model Object
    xgb_model = XGBRegressor(n_estimators = 130, learning_rate = 0.1, max_depth = 9, subsample = 0.8, colsample_bytree = 0.8, random_state = 42)
    
    # Fitting the model
    xgb_model.fit(X_train, y_train)

    # Saving the model
    joblib.dump(xgb_model, 'Models/PricePrediction/xgb.pkl')


# Predicting
y_pred_xgb = xgb_model.predict(X_test)
Model Exists. Loading the model
In [124]:
# Metrics
mse_xgb = mean_squared_error(y_test, y_pred_xgb)
mae_xgb = mean_absolute_error(y_test, y_pred_xgb)
rmse_xgb = np.sqrt(mse_xgb)
r2_xgb = r2_score(y_test, y_pred_xgb)


print(f"Metrics:\nMSE: {mse_xgb}, MAE: {mae_xgb}, RMSE: {rmse_xgb}, R2: {r2_xgb}")
Metrics:
MSE: 2395.0204097663013, MAE: 33.73718720321842, RMSE: 48.93894573615477, R2: 0.7682492747276329

Metrics Comparison¶

In [125]:
df_metrics = pd.DataFrame({'Model': ['Linear Regression', 'Decision Tree', 'Random Forest', 'XGBoost'], 
                           'MSE': [mse_lr, mse_dt, mse_rf, mse_xgb], 
                           'MAE': [mae_lr, mae_dt, mae_rf, mae_xgb],
                           'RMSE': [rmse_lr, rmse_dt, rmse_rf, rmse_xgb],
                           'R-squared': [r2_lr, r2_dt, r2_rf, r2_xgb]})

df_metrics
Out[125]:
Model MSE MAE RMSE R-squared
0 Linear Regression 3996.617275 47.694544 63.218805 0.613273
1 Decision Tree 3435.458526 42.119627 58.612785 0.667573
2 Random Forest 2435.355771 34.107259 49.349324 0.764346
3 XGBoost 2395.020410 33.737187 48.938946 0.768249

Best Model¶

In [126]:
# Highest R-Squared value row
r2_idx_max = df_metrics['R-squared'].idxmax()
r2_idx_min = df_metrics['R-squared'].idxmin()

# Function - Highlighting the row
def highlight_max_r2(max_idx, min_idx):
    def highlight(s):
        return [
            'background-color: lightgreen' if i == max_idx 
            else 'background-color: lightcoral' if i == min_idx 
            else 'background-color: lightcyan' 
            for i in range(len(s))
        ]
    return df_metrics.style.apply(highlight, axis = 0).applymap(lambda _: 'font-weight: bold;', subset = [df_metrics.columns[0]])

# Highlighting the row with the highest R-squared
highlighted_df = highlight_max_r2(r2_idx_max, r2_idx_min)

highlighted_df
Out[126]:
  Model MSE MAE RMSE R-squared
0 Linear Regression 3996.617275 47.694544 63.218805 0.613273
1 Decision Tree 3435.458526 42.119627 58.612785 0.667573
2 Random Forest 2435.355771 34.107259 49.349324 0.764346
3 XGBoost 2395.020410 33.737187 48.938946 0.768249

Best Metrics¶

In [127]:
# Creating a custom colormap from green to red
cmap = mpl.colors.LinearSegmentedColormap.from_list('green_red', ['lightgreen', 'lightcoral'])
cmap_r2 = mpl.colors.LinearSegmentedColormap.from_list('red_green', ['lightcoral', 'lightgreen'])

# Custom function to apply separate gradients
def apply_custom_gradient(styler):
    for col in ['MSE', 'MAE', 'RMSE']:
        styler = styler.background_gradient(subset = [col], cmap = cmap)
    styler = styler.background_gradient(subset=['R-squared'], cmap = cmap_r2)
    return styler

# Styling DataFrame
styled_df = (df_metrics.style.pipe(apply_custom_gradient).applymap(lambda _: 'font-weight: bold', subset = pd.IndexSlice[:, ['Model']]))

styled_df
Out[127]:
  Model MSE MAE RMSE R-squared
0 Linear Regression 3996.617275 47.694544 63.218805 0.613273
1 Decision Tree 3435.458526 42.119627 58.612785 0.667573
2 Random Forest 2435.355771 34.107259 49.349324 0.764346
3 XGBoost 2395.020410 33.737187 48.938946 0.768249

Model Comparison¶

In this analysis, we compared the performance of four regression models: Linear Regression, Decision Tree, Random Forest, and XGBoost. The following metrics were used to evaluate the models: Mean Squared Error (MSE), Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), and R-squared (R²).

1. Linear Regression:¶

  • MSE: 3996.6173
  • MAE: 47.6945
  • RMSE: 63.2188
  • R-squared: 0.61

Linear Regression shows a moderate R-squared value of 0.61, indicating that the model can explain 61% of the variance in the target variable. The lower MSE, MAE, and RMSE compared to the previous analysis suggest improved accuracy, but it still lags behind more complex models. This indicates that while a linear relationship captures more variance than before, it may not be the best fit for the data.

2. Decision Tree:¶

  • MSE: 3435.4585
  • MAE: 42.1196
  • RMSE: 58.6128
  • R-squared: 0.67

The Decision Tree model improved significantly compared to the previous version, achieving an R-squared value of 0.67 and explaining 67% of the variance. Its error metrics (MSE, MAE, RMSE) have also decreased, indicating better predictions. However, it is still outperformed by ensemble methods like Random Forest and XGBoost.

3. Random Forest:¶

  • MSE: 2435.3558
  • MAE: 34.1073
  • RMSE: 49.3493
  • R-squared: 0.76

The Random Forest model continues to show strong performance, with an R-squared value of 0.76, explaining 76% of the variance in the target variable. Its lower MSE, MAE, and RMSE metrics reflect its ability to make more accurate predictions. This highlights the benefits of ensemble learning and the robustness of the model for this task.

4. XGBoost:¶

  • MSE: 2395.0204
  • MAE: 33.7372
  • RMSE: 48.9390
  • R-squared: 0.77

XGBoost emerges as the top-performing model, slightly surpassing Random Forest with an R-squared value of 0.77, explaining 77% of the variance in the target variable. Its MSE, MAE, and RMSE are the lowest among all models, indicating the highest accuracy. This showcases XGBoost’s ability to optimize predictions effectively, making it a strong choice for this dataset.


Summary:¶

  • XGBoost performed the best with the highest R-squared value 0.77, making the most accurate predictions with the lowest error metrics.
  • Random Forest was a close second, with an R-squared value of 0.76, demonstrating strong predictive power and low error metrics.
  • Decision Tree improved significantly, achieving an R-squared value of 0.67, but it was outperformed by the ensemble methods.
  • Linear Regression, while improved, had the lowest R-squared value of 0.61, indicating it was the least effective in capturing the underlying relationships in the data.

Conclusion: Based on the evaluation metrics, XGBoost now emerges as the best-performing model for this price prediction task, demonstrating the highest accuracy and explaining the most variance in the target variable.

Marketing Strategies¶

1. Existing Hosts:¶

Benefit of Using the Price Prediction Model: Existing hosts can leverage the price prediction model to optimize their pricing based on several factors, such as market trends, listing features, and demand. By plugging in their property details (e.g., location, room type, amenities, etc.), the model can provide a competitive price estimate, allowing them to adjust their pricing dynamically.

How It Can Help:

  • Dynamic Pricing: If an existing host wants to adjust their rates based on current trends or events (like holidays, local festivals, or seasonality), they can input their current listing details into the model and receive an adjusted price that aligns with the market.
  • Competitive Analysis: The price prediction model helps hosts understand the ideal pricing for their listing in comparison to similar properties. If their price is too high or low compared to similar listings, they can adjust to stay competitive.
  • Revenue Maximization: By adjusting prices based on model predictions, hosts can maximize their revenue. The model helps avoid underpricing (leading to missed revenue) or overpricing (leading to fewer bookings).

2. Potential Guests:¶

Benefit of Using the Price Prediction Model: Guests can use the price prediction model to understand the fair price for a property they’re interested in, helping them determine whether the listing is priced competitively based on its features and location.

How It Can Help:

  • Price Comparison: Guests can input the details of properties they are considering and compare the predicted price with the listing’s actual price to ensure they are getting a fair deal.
  • Budget Planning: By knowing the predicted price range for the type of property they want, guests can better plan their budget, especially for long stays or specific dates when prices might fluctuate due to demand.
  • Identifying Overpriced Listings: Guests can use the price prediction model to check if a listing is marked up unfairly. If the model predicts a price much lower than the listed price, they can negotiate or opt for other more competitively priced options.

3. New Hosts:¶

Benefit of Using the Price Prediction Model: New hosts, with limited experience, can use the price prediction model to set the right price for their listings based on real market data. The model can serve as a guide to help them start strong, especially if they are unsure how to price their property effectively.

How It Can Help:

  • Initial Pricing Guidance: New hosts can enter their property details into the model and receive a suggested price range for their listing, ensuring they start with a competitive price. This takes the guesswork out of pricing and gives them a benchmark to build on.
  • Avoiding Underpricing or Overpricing: The model can help new hosts avoid the common pitfalls of underpricing (leading to missed revenue) or overpricing (leading to fewer bookings), especially when they are unfamiliar with market trends.
  • Optimizing for Occupancy: New hosts can also use the model to predict prices based on seasonality and local demand, ensuring they don’t miss out on potential bookings. For example, the model can show them when to increase prices during peak seasons or offer discounts during off-peak periods.

Summary of How Each Group Can Use the Model:¶

  • Existing Hosts: Use the model to optimize prices dynamically, maximize revenue, and ensure competitiveness in the market by adjusting prices based on market demand and similar listings.
  • Potential Guests: Use the model to compare prices across different listings, ensuring they are getting a good deal and staying within their budget. They can also check if the listings are priced fairly compared to the market.
  • New Hosts: Use the model to set initial pricing based on data, avoiding common mistakes like underpricing or overpricing. This ensures they start their journey with a competitive price that aligns with market trends and demand.

The price prediction model ultimately helps both hosts and guests make data-driven decisions, ensuring that hosts maximize revenue and guests get good value, which is critical in a competitive market like Airbnb.

Visualization¶

In [128]:
vis_data = model_data.copy()

# Creating a `date` feature from original data source
vis_data['date'] = pd.to_datetime(df['host_since'])

1. Actual VS Predicted (Prices)¶

In [129]:
# Defining a temporary Data Frame for Actual & Predicted values
temp_df = pd.DataFrame({
    'Actual': y_test,
    'Predicted': y_pred_rf
})

# Visualization - ScatterPlot
fig = px.scatter(temp_df, x = 'Actual', y = 'Predicted', title = "Actual vs Predicted Prices")

# Adding the `ideal line` slightly above the scatter points
ideal_y = temp_df['Actual'] + (max(temp_df['Predicted']) - min(temp_df['Predicted'])) * 0.1

# Visualization - Adding the `ideal line`
fig.add_scatter(x = [min(temp_df['Actual']), max(temp_df['Actual'])], 
                y = [min(temp_df['Predicted']), max(temp_df['Predicted'])], 
                mode = 'lines', name = 'Ideal Line', line = dict(dash = 'dash', color = 'red'))

# Customizing the background color
fig.update_layout(
    plot_bgcolor = '#dfedff',
    paper_bgcolor = '#dfedff',
    width = 1000,
    height = 300
)

# Showing
fig.show()

2. Reviews Importance in Price¶

In [130]:
# Getting all the features required
correlation_matrix = vis_data[['price', 'review_scores_accuracy', 'review_scores_cleanliness', 
                                 'review_scores_checkin', 'review_scores_communication', 
                                 'review_scores_location', 'review_scores_value']].corr()

# Figure Size
plt.figure(figsize = (10, 5))

# Visualization - Heatmap
ax = sns.heatmap(correlation_matrix, annot = True, cmap = 'coolwarm', fmt = '.2f')

# Customizing the background color
fig = plt.gcf()
fig.patch.set_facecolor('#dfedff')
ax.set_facecolor('#dfedff')

# Labeling - Title
plt.title('Price vs. Review Scores Heatmap')

# Showing
plt.show()
No description has been provided for this image

3. Price by Accomodates¶

In [131]:
# Figure Size
plt.figure(figsize = (10, 5))

# Visualization - Barplot
ax = sns.barplot(x = 'accommodates', y = 'price', data = vis_data, estimator = 'mean')

# Customizing the background color
fig = plt.gcf()
fig.patch.set_facecolor('#dfedff')
ax.set_facecolor('#dfedff')

# Labeling
plt.title('Average Price by Accommodates')
plt.xlabel('Accommodates')
plt.ylabel('Average Price')

# Showing
plt.show()
No description has been provided for this image

4. Seasonal Price Trend¶

In [132]:
# Function - In order to get Seasons based on the months
def get_season(month):
    if month in [12, 1, 2]:
        return 'Winter'
    elif month in [3, 4, 5]:
        return 'Spring'
    elif month in [6, 7, 8]:
        return 'Summer'
    else:
        return 'Fall'

# Creating a new feature `season` using the `get_season` function
vis_data['season'] = vis_data['date'].dt.month.apply(get_season)

# Figure Size
plt.figure(figsize = (10, 2))

# Visualization - Lineplot
ax = sns.lineplot(x = 'season', y = 'price', data = vis_data, estimator = 'mean', ci = None)

# Customizing the background color
fig = plt.gcf()
fig.patch.set_facecolor('#dfedff')
ax.set_facecolor('#dfedff')

# Labeling
plt.title('Price Trends Over Seasons')
plt.xlabel('Season')
plt.ylabel('Average Price')

# Customizing the x-axis (Season Names)
plt.xticks(['Winter', 'Spring', 'Summer', 'Fall'])

# Showing
plt.show()
No description has been provided for this image

5. Monthly Price Trend¶

In [133]:
# Extracting months from date feature
vis_data['month'] = vis_data['date'].dt.month

# Figure Size
plt.figure(figsize = (10, 2))

# Visualization - Lineplot
ax = sns.lineplot(x = 'month', y = 'price', data = vis_data, estimator = 'mean', ci = None)

# Customizing the background color
fig = plt.gcf()
fig.patch.set_facecolor('#dfedff')
ax.set_facecolor('#dfedff')

# Labeling
plt.title('Price Trends Over Time (Monthly)')
plt.xlabel('Month')
plt.ylabel('Average Price')

# Customizing the x-axis (Month Names)
plt.xticks(range(1, 13), ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec'])

# Showing
plt.show()
No description has been provided for this image

6. Weekly Price Trend¶

In [134]:
# Extracting Weekdays
vis_data['weekday'] = vis_data['date'].dt.weekday

# Figure Size
plt.figure(figsize = (10, 2))

# Visualization - Lineplot
ax = sns.lineplot(x = 'weekday', y = 'price', data = vis_data, estimator = 'mean', ci = None)

# Customizing the background color
fig = plt.gcf()
fig.patch.set_facecolor('#dfedff')
ax.set_facecolor('#dfedff')

# Labeling
plt.title('Price Trends Over Weekdays')
plt.xlabel('Day of the Week')
plt.ylabel('Average Price')

# Customizing the x-axis (Weekday Names)
plt.xticks(range(0, 7), ['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun'])

# Showing
plt.show()
No description has been provided for this image

References¶

  • PyPI (Python Package Index)
  • Scikit-learn Documentation
  • NumPy Documentation
  • GeeksforGeeks
  • Stack Overflow
  • JavatPoint